What the Heck is Puppygraph?

4 min readFeb 26, 2024

Introduction

What the heck is PuppyGraph? That was the first thing I asked myself when I came across it in the Summer of 2023. It was my involvement with Apache Iceberg that brought it to my attention; they wanted to add Iceberg support to PuppyGraph, but just what the heck is it?

This blog is going to be more of a hot take on PuppyGraph to get you thinking about how you might use it in your own projects. I have no affiliation with the company or project other than thinking it was pretty cool. Co-founder Weimo Liu recently (Feb 2024) gave a presentation at the Chill Data Summit that was interesting, and well received, according to my friends that were there.

What is PuppyGraph?

Simply, PuppyGraph is a cloud-native graph data lakehouse providing a graph analytics engine for your data. They address graph scalability through the auto-sharding of data so the compute and storage are separate, much like the lakehouse design. So, they provide a graph data warehouse, data lake, and multi-data models on a single copy of your data. That means you can do some pretty cool graphing on your data in one of the supported formats.

What can it connect to?

PuppyGraph has rapidly added support for various platforms, catalogs, and connection engines. Currently, we see:

Apache Iceberg
Apache Hudi
Delta Lake
MySQL
PostgreSQL
DuckDB
BigQuery
Redshift
LanceDB (coming soon)
JDBC Catalog
Data Lake Catalog
– Hive Metastore
– AWS Glue

Their SaaS interface also gives you direct access to both a Gremlin and Cypher console to perform graph queries, in addition to a graph notebook, which uses Jupyter.

Using PuppyGraph

A Docker container is provided to allow you to get started on a local machine. You’ll need a schema defined in JSON format that will define your data layout to PuppyGraph. Once you ingest that and it is verified, then away you go.

The integrated graph browser is pretty nifty. You can easily zoom in/out to see the clustering and attributes in addition to queries.

Zooming in further, we can see more of the details

Clicking on a node will give us a pop up of details:

This allows you to explore different vertices and edges easily. These static pictures don’t really represent how fast the performance is or how much fun it is to bounce around your data. I should have utilized some genealogical data for fun.

Because they are using the Gremlin and Cypher query languages, that means any 3rd party UI tool will also be compatible. A real advantage here is that PuppyGraph works on the data where it lives and isn’t making you copy it elsewhere. Without going into the particulars on a specific platform, this gives you a general idea of what features and functions are available.

Summary

Certainly, graph databases and their representation don’t apply generically as a structured database does, but we are seeing more and more how these kinds of data representations are being used to model the real world. I didn’t see that this is an open-source project, and I didn’t find it on GitHub. There is no mention of pricing, so I’m not sure where they are going with all of this. The documentation isn’t amazing, but it seems to be enough to get started and try it out. Overall, this is a fun project to play with. I need to percolate on it more to see where I might use it, but I can envision some interesting use cases combining it with other self-contained projects like DuckDB and LanceDB.

Check out my other What the Heck is… articles at the links below: