Minimal example to run Presto with Minio and the Hive standalone metastore on Docker. The data in this tutorial was converted into an Apache Parquet file from the famous Iris data set.
Install s3cmd with:
sudo apt update
sudo apt install -y \
s3cmd \
openjdk-11-jre-headless # Needed for presto-cliPull and run all services with:
docker-compose upConfigure s3cmd with (or use the minio.s3cfg configuration):
s3cmd --config minio.s3cfg --configureUse the following configuration for the s3cmd configuration when prompted:
Access Key: minio_access_key
Secret Key: minio_secret_key
Default Region [US]:
S3 Endpoint [s3.amazonaws.com]: localhost:9000
DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: localhost:9000
Encryption password:
Path to GPG program [/usr/bin/gpg]:
Use HTTPS protocol [Yes]: no
To create a bucket and upload data to minio, type:
s3cmd --config minio.s3cfg \
mb s3://iris
s3cmd --config minio.s3cfg \
put data/iris.parq s3://iris/iris_parquet/iris.parqTo list all object in all buckets, type:
s3cmd --config minio.s3cfg laDownload presto cli with:
wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.253/presto-cli-0.253-executable.jar \
-O presto
chmod +x presto # Make it executableCreate schema and create table with:
./presto --execute "
CREATE SCHEMA IF NOT EXISTS minio.iris
WITH (location = 's3a://iris/');
CREATE TABLE IF NOT EXISTS minio.iris.iris_parquet (
sepal_length DOUBLE,
sepal_width DOUBLE,
petal_length DOUBLE,
petal_width DOUBLE,
class VARCHAR
)
WITH (
external_location = 's3a://iris/iris_parquet',
format = 'PARQUET'
);"Query the newly created table with:
./presto --execute "
SHOW TABLES IN minio.iris;
SELECT * FROM minio.iris.iris_parquet LIMIT 5;"Survived,Pclass,Name,Sex,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare 0,3,Mr. Owen Harris Braund,male,22,1,0,7.25
./presto --execute "
CREATE TABLE hive.testbucket.titanic (
Survived INTEGER,
Pclass INTEGER,
Name VARCHAR,
Sex VARCHAR,
Age INTEGER,
SiblingsSpouses INTEGER,
ParentsChildren INTEGER,
Fare DOUBLE)
WITH (FORMAT = 'CSV',
csv_separator = ',',
external_location = 's3a://testbucket/titanic');"
./presto --execute "
CREATE TABLE hive.testbucket.titanic (
Survived int,
Pclass int,
Name varchar,
Sex varchar,
Age int,
SiblingsSpouses int,
ParentsChildren int,
Fare double)
WITH (FORMAT = 'TEXTFILE',
csv_separator = ',',
external_location = 's3a://testbucket/titanic');"Create new table in PostgreSQL from TPCDS:
./presto --execute "
CREATE TABLE postgresql.public.item AS
SELECT i_item_id, i_item_desc
FROM tpcds.tiny.item;"Create new table in Minio from TPCDS:
# Create new bucket
s3cmd --config minio.s3cfg mb s3://data
./presto --execute "
CREATE SCHEMA IF NOT EXISTS minio.data
WITH (location = 's3a://data/');
CREATE TABLE minio.data.item
WITH (format = 'PARQUET') AS
SELECT i_item_id, i_item_desc
FROM tpcds.tiny.item;This project is licensed under the MIT license. See the LICENSE for details.