What does ParkerDB do? #
If you are a data scientist or data engineer,
- do you generate a daily hive table with machine learning features for each user?
- does the table need to be queried by the user id with low latency and high concurrency?
More generally, the common characteristics are:
- A large table with many rows updated weekly/daily/hourly.
- The data should be queried by a key with low latency and high concurrency.
ParkerDB helps data scientists and data engineers to
- Publish their full tables as frequent as necessary.
- Provide fast and scalable data lookup.
How does ParkerDB help? #
graph TD A[Data Warehouse] -->|Daily Job| B[Hive Table] S[Snowflake] -->|Daily Job| ST[Table] ST -->|Copy Into| C B --> C[AWS S3] C -->|Pull| D[ParkerDB Lookup Service] D --> E[Apps]
ParkerDB pulls the Hive table data from AWS S3 and optimizes the data for fast lookup. ParkerDB ensures milliseconds query latency and tens of thousands of queries per second on each server.
The data should be ready to query as soon as pulled down from S3.
How to use ParkerDB? #
- Create a Hive table with the data sorted by the lookup key.
- Run a daily job to publish the data to AWS S3.
- Use the ParkerDB Lookup Service to query the data.
How much does ParkerDB cost? #
Based on the usage, we will provide a quote. The pricing is based on the data size, query rate, and query latency. You can expect the cost to be about 20% of the cost of DynamoDB with on-demand capacity and comparable configurations.
How to get started? #
We are currently in beta mode.
Email us at support at parkerdb.com with the following information:
- Your company name
- Your use case
- Your expected data size
- Your expected query rate
- Your expected query latency
- AWS S3 bucket region
We will get back to you within 24 hours. And you may need to provide the following information:
- Your AWS S3 bucket name, region, and readonly credentials
- Table name
- Primary Key column name
After that, we will provide you with a gRPC API endpoint. You can use the gRPC API to query the data.