If you are interested in Postgres FDWs, please check out Steampipe (https://steampipe.io). It's an open source project with a Go-based plugin model (similar to Terraform) to instantly query cloud services (AWS, GCP, GitHub, Slack, etc) using SQL. TBH, we've been completely shocked how far we can push this model querying real-time against APIs.
We're bullish on the ELT use case and think of virtual tables as "Data Rainbows" - structured, ephemeral access to cloud data. I spoke about this concept for YOW Data (https://youtu.be/2BNzIU5SFaw?t=183). Still structuring our thoughts here, so feedback and ideas would be greatly appreciated.
Disclaimers: Steampipe is open source. I'm a lead on the project. I can't stand listening to my own talk.
A completely fair point ... and one we debated at length.
Ultimately we decided that the convenience for users and documentation of a single location outweighed the benefits of following the basedir spec, particularly when deployed across operating systems. We looked at other tools, including terraform [1], in reaching a final decision. FWIW, we do let you customize the location of the "install dir" [2].
Hopefully the benefits of Steampipe outweigh the "peeve" factor in this case <grin>
Sadly I read that as "Other tools do their own thing so we decided to do our own thing too" :(
Though admittedly, "convenience for users" is a rather tricky thing to define, as you really need to define "which users". I imagine ~/.projectname is easier getting started out, but the long term management is why we (try) to have standard locations for things.
Agreed. Unfortunately requirements for long term management / large scale deployment aren't a priority until a tool is widely adopted, which is best achieved by keeping it simple. An interesting trade off...
I can really see how Steampipe rounds out a great DIY data pipeline. I've used Segment, Fivetran, Stitch Data, and Airbyte to shovel data into local storage, from RDBMS to Kafka, but this is definitely the most developer-friendly experience I've seen so far.
Already exploring using plugin metadata to do useful things in dbt data pipelines.
I really appreciate that, we're working hard to make it as developer friendly as possible. If you (or anyone else here) can spare a few mins to discuss use cases for Steampipe in data pipelines please drop me a note (email in bio)!
I noticed that, but it lists only 37 plugins, not 200+, as advertized here: https://steampipe.io/ I figured the remaining 163+ might be built in, instead of being supplied as plugins.
We currently have 37 open source plugins [1] like AWS, GCP, GitHub, Kubernetes, Hacker News, etc. More are in development by Turbot and community members [2]. Each plugin has many tables, e.g. 223 for AWS [3], 23 for GitHub [4].
The 200+ on the home page is referring to tables. Most tables have a single API source behind them, but others like aws_s3_bucket [5] have ~10 API calls for each row to collect related data like tags, versioning, etc.
(We're excited about the rapid growth of our plugins / tables, but can see that if you read it as plugin == data source then the 200+ would be wildly impressive.)
Steampipe is a new project we launched in Jan 2021. Turbot is leading the build, and has been working with cloud APIs since 2014. We're bootstrapped, and have a lot less than 100 people, but are hiring (engineering, marketing, tech writing) if anyone would like to get involved :-)
nathanwallace 2021-08-18 23:16:59 +0000 UTC [ - ]
We're bullish on the ELT use case and think of virtual tables as "Data Rainbows" - structured, ephemeral access to cloud data. I spoke about this concept for YOW Data (https://youtu.be/2BNzIU5SFaw?t=183). Still structuring our thoughts here, so feedback and ideas would be greatly appreciated.
Disclaimers: Steampipe is open source. I'm a lead on the project. I can't stand listening to my own talk.
kfdm 2021-08-19 04:19:56 +0000 UTC [ - ]
Under https://steampipe.io/docs/reference/config-files you put everything into ~/.steampipe I'm not a fan of random apps putting a random directory there. I prefer things to use the correct directory for the platform. Example https://specifications.freedesktop.org/basedir-spec/basedir-...
nathanwallace 2021-08-19 04:30:55 +0000 UTC [ - ]
Ultimately we decided that the convenience for users and documentation of a single location outweighed the benefits of following the basedir spec, particularly when deployed across operating systems. We looked at other tools, including terraform [1], in reaching a final decision. FWIW, we do let you customize the location of the "install dir" [2].
Hopefully the benefits of Steampipe outweigh the "peeve" factor in this case <grin>
1 - https://www.terraform.io/docs/cli/config/config-file.html 2 - https://steampipe.io/docs/reference/env-vars#steampipe_insta...
kfdm 2021-08-19 04:35:19 +0000 UTC [ - ]
Though admittedly, "convenience for users" is a rather tricky thing to define, as you really need to define "which users". I imagine ~/.projectname is easier getting started out, but the long term management is why we (try) to have standard locations for things.
nathanwallace 2021-08-19 04:44:45 +0000 UTC [ - ]
kfdm 2021-08-19 04:47:58 +0000 UTC [ - ]
tomnipotent 2021-08-19 06:42:47 +0000 UTC [ - ]
I can really see how Steampipe rounds out a great DIY data pipeline. I've used Segment, Fivetran, Stitch Data, and Airbyte to shovel data into local storage, from RDBMS to Kafka, but this is definitely the most developer-friendly experience I've seen so far.
Already exploring using plugin metadata to do useful things in dbt data pipelines.
nathanwallace 2021-08-19 11:48:27 +0000 UTC [ - ]
agent327 2021-08-19 09:46:58 +0000 UTC [ - ]
vforvendettador 2021-08-19 10:51:26 +0000 UTC [ - ]
And no, there isn't one for Steam.
agent327 2021-08-19 13:09:17 +0000 UTC [ - ]
nathanwallace 2021-08-19 14:48:29 +0000 UTC [ - ]
I expanded more on this at https://news.ycombinator.com/item?id=28229700.
Having said that, we find plugins are quick and fun to write so we'd love contributions or suggestions for any that would bring you value!
Dowwie 2021-08-19 09:27:45 +0000 UTC [ - ]
ccleve 2021-08-19 02:13:18 +0000 UTC [ - ]
nathanwallace 2021-08-19 02:33:24 +0000 UTC [ - ]
The 200+ on the home page is referring to tables. Most tables have a single API source behind them, but others like aws_s3_bucket [5] have ~10 API calls for each row to collect related data like tags, versioning, etc.
(We're excited about the rapid growth of our plugins / tables, but can see that if you read it as plugin == data source then the 200+ would be wildly impressive.)
1 - https://hub.steampipe.io/plugins 2 - https://github.com/topics/steampipe 3 - https://hub.steampipe.io/plugins/turbot/github/aws 4 - https://hub.steampipe.io/plugins/turbot/github/tables 5 - https://hub.steampipe.io/plugins/turbot/aws/tables/aws_s3_bu...
ccleve 2021-08-19 02:29:28 +0000 UTC [ - ]
nathanwallace 2021-08-19 02:37:34 +0000 UTC [ - ]
ccleve 2021-08-19 02:52:14 +0000 UTC [ - ]
eitland 2021-08-19 07:55:56 +0000 UTC [ - ]
Remember: WhatsApp used to be 50 people.
mulmen 2021-08-18 23:54:14 +0000 UTC [ - ]
nathanwallace 2021-08-19 00:01:03 +0000 UTC [ - ]
slt2021 2021-08-19 00:43:01 +0000 UTC [ - ]