Hugo Hacker News

Understanding Foreign Data Wrappers in Postgres and Postgres_fdw

nathanwallace 2021-08-18 23:16:59 +0000 UTC [ - ]

If you are interested in Postgres FDWs, please check out Steampipe (https://steampipe.io). It's an open source project with a Go-based plugin model (similar to Terraform) to instantly query cloud services (AWS, GCP, GitHub, Slack, etc) using SQL. TBH, we've been completely shocked how far we can push this model querying real-time against APIs.

We're bullish on the ELT use case and think of virtual tables as "Data Rainbows" - structured, ephemeral access to cloud data. I spoke about this concept for YOW Data (https://youtu.be/2BNzIU5SFaw?t=183). Still structuring our thoughts here, so feedback and ideas would be greatly appreciated.

Disclaimers: Steampipe is open source. I'm a lead on the project. I can't stand listening to my own talk.

kfdm 2021-08-19 04:19:56 +0000 UTC [ - ]

Very neat! Though I notice a particular pet peeve of mine . . .

Under https://steampipe.io/docs/reference/config-files you put everything into ~/.steampipe I'm not a fan of random apps putting a random directory there. I prefer things to use the correct directory for the platform. Example https://specifications.freedesktop.org/basedir-spec/basedir-...

nathanwallace 2021-08-19 04:30:55 +0000 UTC [ - ]

A completely fair point ... and one we debated at length.

Ultimately we decided that the convenience for users and documentation of a single location outweighed the benefits of following the basedir spec, particularly when deployed across operating systems. We looked at other tools, including terraform [1], in reaching a final decision. FWIW, we do let you customize the location of the "install dir" [2].

Hopefully the benefits of Steampipe outweigh the "peeve" factor in this case <grin>

1 - https://www.terraform.io/docs/cli/config/config-file.html 2 - https://steampipe.io/docs/reference/env-vars#steampipe_insta...

kfdm 2021-08-19 04:35:19 +0000 UTC [ - ]

Sadly I read that as "Other tools do their own thing so we decided to do our own thing too" :(

Though admittedly, "convenience for users" is a rather tricky thing to define, as you really need to define "which users". I imagine ~/.projectname is easier getting started out, but the long term management is why we (try) to have standard locations for things.

nathanwallace 2021-08-19 04:44:45 +0000 UTC [ - ]

Agreed. Unfortunately requirements for long term management / large scale deployment aren't a priority until a tool is widely adopted, which is best achieved by keeping it simple. An interesting trade off...

kfdm 2021-08-19 04:47:58 +0000 UTC [ - ]

All the more reason to handle it sooner when its less painful to do the right thing.

tomnipotent 2021-08-19 06:42:47 +0000 UTC [ - ]

> We're bullish on the ELT

I can really see how Steampipe rounds out a great DIY data pipeline. I've used Segment, Fivetran, Stitch Data, and Airbyte to shovel data into local storage, from RDBMS to Kafka, but this is definitely the most developer-friendly experience I've seen so far.

Already exploring using plugin metadata to do useful things in dbt data pipelines.

nathanwallace 2021-08-19 11:48:27 +0000 UTC [ - ]

I really appreciate that, we're working hard to make it as developer friendly as possible. If you (or anyone else here) can spare a few mins to discuss use cases for Steampipe in data pipelines please drop me a note (email in bio)!

agent327 2021-08-19 09:46:58 +0000 UTC [ - ]

Looks neat! Is there a list of those 200+ data sources somewhere? Is there one for the Steam database?

vforvendettador 2021-08-19 10:51:26 +0000 UTC [ - ]

There is this concept of "Plugins" that will tell you where you can pull the data from: https://hub.steampipe.io/plugins

And no, there isn't one for Steam.

agent327 2021-08-19 13:09:17 +0000 UTC [ - ]

I noticed that, but it lists only 37 plugins, not 200+, as advertized here: https://steampipe.io/ I figured the remaining 163+ might be built in, instead of being supplied as plugins.

nathanwallace 2021-08-19 14:48:29 +0000 UTC [ - ]

It's a pure plugin model, none built in. On the home page we considered data source ~= API ~= table, sorry if that was not clear.

I expanded more on this at https://news.ycombinator.com/item?id=28229700.

Having said that, we find plugins are quick and fun to write so we'd love contributions or suggestions for any that would bring you value!

Dowwie 2021-08-19 09:27:45 +0000 UTC [ - ]

What backend has had the most surprising benefits by adding a SQL frontend to it?

ccleve 2021-08-19 02:13:18 +0000 UTC [ - ]

Your website says 200+ data sources. How the heck did you generate so many? This project does not look very old.

nathanwallace 2021-08-19 02:33:24 +0000 UTC [ - ]

We currently have 37 open source plugins [1] like AWS, GCP, GitHub, Kubernetes, Hacker News, etc. More are in development by Turbot and community members [2]. Each plugin has many tables, e.g. 223 for AWS [3], 23 for GitHub [4].

The 200+ on the home page is referring to tables. Most tables have a single API source behind them, but others like aws_s3_bucket [5] have ~10 API calls for each row to collect related data like tags, versioning, etc.

(We're excited about the rapid growth of our plugins / tables, but can see that if you read it as plugin == data source then the 200+ would be wildly impressive.)

1 - https://hub.steampipe.io/plugins 2 - https://github.com/topics/steampipe 3 - https://hub.steampipe.io/plugins/turbot/github/aws 4 - https://hub.steampipe.io/plugins/turbot/github/tables 5 - https://hub.steampipe.io/plugins/turbot/aws/tables/aws_s3_bu...

ccleve 2021-08-19 02:29:28 +0000 UTC [ - ]

Oh, I see. The parent company, Turbot, has 100+ employees and has been at this for a while.

nathanwallace 2021-08-19 02:37:34 +0000 UTC [ - ]

Steampipe is a new project we launched in Jan 2021. Turbot is leading the build, and has been working with cloud APIs since 2014. We're bootstrapped, and have a lot less than 100 people, but are hiring (engineering, marketing, tech writing) if anyone would like to get involved :-)

ccleve 2021-08-19 02:52:14 +0000 UTC [ - ]

Crunchbase says 101-250 employees, so they must have guessed wrong. https://www.crunchbase.com/organization/turbot Hope you get to that level soon, looks like a great product.

eitland 2021-08-19 07:55:56 +0000 UTC [ - ]

It will actually be better for everyone the smaller the number of employees is as long as all positions are covered and the bus factor is acceptable.

Remember: WhatsApp used to be 50 people.

mulmen 2021-08-18 23:54:14 +0000 UTC [ - ]

What’s YOW data?

nathanwallace 2021-08-19 00:01:03 +0000 UTC [ - ]

YOW! Data 2021 (https://yowconference.com/data/) was an online conference held in May. Talks are all now available on YouTube (https://www.youtube.com/playlist?list=PLIpl4GKFQR6e2iVY2DZbp...).

slt2021 2021-08-19 00:43:01 +0000 UTC [ - ]

great idea, great product, great job!