Getting started with Spice.ai
Follow these steps to get started with Spice.ai.
Step 1. Install the Spice CLI:
curl https://install.spiceai.org | /bin/bash
Step 2. Initialize a new Spice app with the spice init
command:
spice init spice_app
A Spicepod.yaml
file is created in the working directory.
Step 3. Connect to the sample Dremio instance to access the sample data:
spice login dremio -u demo -p demo1234
Step 4. Start the Spice runtime:
spice run
Example output will be shown as follows:
Spice.ai runtime starting...
Using latest 'local' runtime version.
2024-02-21T06:11:56.381793Z INFO runtime::http: Spice Runtime HTTP listening on 127.0.0.1:3000
2024-02-21T06:11:56.381853Z INFO runtime::flight: Spice Runtime Flight listening on 127.0.0.1:50051
2024-02-21T06:11:56.382038Z INFO runtime::opentelemetry: Spice Runtime OpenTelemetry listening on 127.0.0.1:50052
The runtime is now started and ready for queries.
Step 5. In a new terminal window, add the spiceai/quickstart
Spicepod. A Spicepod is a package of configuration defining datasets and ML models.
spice add spiceai/quickstart
The Spicepod.yaml
file will be updated with the spiceai/quickstart
dependency.
version: v1beta1
kind: Spicepod
name: PROJECT_NAME
dependencies:
- spiceai/quickstart
The spiceai/quickstart
Spicepod will add a taxi_trips
data table to the runtime which is now available to query by SQL.
2024-02-22T05:53:48.222952Z INFO runtime: Loaded dataset: taxi_trips
2024-02-22T05:53:48.223101Z INFO runtime::dataconnector: Refreshing data for taxi_trips
Step 6. Start the Spice SQL REPL:
spice sql
The SQL REPL inferface will be shown:
Welcome to the interactive Spice.ai SQL Query Utility! Type 'help' for help.
show tables; -- list available tables
sql>
Enter show tables;
to display the available tables for query:
sql> show tables;
+---------------+--------------------+-------------+------------+
| table_catalog | table_schema | table_name | table_type |
+---------------+--------------------+-------------+------------+
| datafusion | public | taxi_trips | BASE TABLE |
| datafusion | information_schema | tables | VIEW |
| datafusion | information_schema | views | VIEW |
| datafusion | information_schema | columns | VIEW |
| datafusion | information_schema | df_settings | VIEW |
+---------------+--------------------+-------------+------------+
Query took: 0.004728897 seconds
Enter a query to display the most expensive tax trips:
sql> SELECT trip_distance_mi, fare_amount FROM taxi_trips ORDER BY fare_amount LIMIT 10;
Output:
+------------------+-------------+
| trip_distance_mi | fare_amount |
+------------------+-------------+
| 1.1 | 7.5 |
| 6.1 | 23.0 |
| 0.6 | 4.5 |
| 16.7 | 52.0 |
| 11.3 | 37.5 |
| 1.1 | 6.0 |
| 5.3 | 18.5 |
| 1.3 | 7.0 |
| 1.0 | 7.0 |
| 3.5 | 17.5 |
+------------------+-------------+
Query took: 0.002458976 seconds
Next Steps
You can use any number of predefined datasets available from Spice.ai in the Spice Runtime.
A list of publically available datasets from Spice.ai can be found here: https://docs.spice.ai/building-blocks/datasets.
In order to access public datasets from Spice, you will first need to create an account with Spice.ai by selecting the free tier membership.
Navigate to https://spice.ai/ and create a new account by clicking on Try for Free.
After creating an account, you will need to create an app in order to create to an API key.
You will now be able to access datasets from Spice.ai. For this demonstration, we will be using the Spice.ai/eth.recent_blocks dataset.
Step 1. In a new directory, log in and authenticate from the command line using the spice login
command. A pop up browser window will prompt you to authenticate:
spice login
Step 2. Initialize a new project if you haven’t already done so. Then, start the runtime:
spice init my_spiceai_project
spice run
Step 3. Configure the dataset:
In a new terminal window, configure a new dataset using the spice dataset configure
command:
spice dataset configure
You will be prompted to enter a name. Enter a name that represents the contents of the dataset
dataset name: (default) eth_recent_blocks
Enter the description of the dataset:
description: eth recent logs
Enter the location of the dataset:
from: spice.ai/eth.recent_blocks
Select y
when prompted whether to accelerate the data:
Locally accelerate (y/n)? y
You should see the following output from your runtime terminal:
2024-02-21T22:49:10.038461Z INFO runtime: Loaded dataset: eth_recent_blocks
Step 4. In a new terminal window, use the Spice SQL REPL to query the dataset
spice sql
SELECT number, size, gas_used from eth_recent_blocks LIMIT 10;
The output displays the results of the query along with the query execution time:
+----------+--------+----------+
| number | size | gas_used |
+----------+--------+----------+
| 19281345 | 400378 | 16150051 |
| 19281344 | 200501 | 16480224 |
| 19281343 | 97758 | 12605531 |
| 19281342 | 89629 | 12035385 |
| 19281341 | 133649 | 13335719 |
| 19281340 | 307584 | 18389159 |
| 19281339 | 89233 | 13391332 |
| 19281338 | 75250 | 12806684 |
| 19281337 | 100721 | 11823522 |
| 19281336 | 150137 | 13418403 |
+----------+--------+----------+
Query took: 0.004057791 seconds
You can experiment with the time it takes to generate queries when using non-accelerated datasets. You can change the acceleration setting from true
to false
in the datasets.yaml file.
Importing dataset from Dremio
Step 1. If you have a dataset hosted in Dremio, you can load it into the Spice Runtime as follows:
spice login dremio -u <USERNAME> -p <PASSWORD>
Step 2. If you haven’t already initialized a new project, you need to do so. Then, start the Spice Runtime.
spice init dremio-demo-project
spice run
Step 3. We now configure the dataset from Dremio:
spice dataset configure
Enter the name of the dataset:
dataset name: (default) my_dataset
Enter the description of the dataset:
description: my dataset in dremio
Specify the location of the dataset:
from: dremio:datasets.my_dataset
Select “y” when prompted whether to locally accelerate the dataset:
Locally accelerate (y/n)? y
We should now see the following output:
Dataset settings written to `datasets/my_dataset/dataset.yaml`!
If the login credentials were entered correctly, your dataset will have loaded into the runtime. You should see the following in the Spice runtime terminal :
2024-02-14T18:34:15.174564Z INFO spiced: Loaded dataset: my_dataset
2024-02-14T18:34:15.175189Z INFO runtime::datasource: Refreshing data for my_dataset
Step 4. Run queries against the dataset using the Spice SQL REPL.
In a new terminal, start the Spice SQL REPL
spice sql
You can now now query my_dataset
in the runtime.