An Introduction to The Reactor

4 min readOct 2, 2017

Once upon a time, you had an AWS Account. Every day, you fought the console and CLI kicking and screaming for the information you needed to debug your production issues. Due to your frustrations, you installed the CloudZero Reactor which builds a semantic map of your cloud events and resources. Using the context derived from that semantic map, you were able to get clear answers to your cloud questions. Eventually, you forgot AWS had a console or command line, tossed your pager-attached mobile device on silent, and caught some much needed zzzs.

What is The Reactor?

I’m excited to give you a Wonka-worthy tour of the inside of the CloudZero Reactor, the nerve-center of joyous cloud reliability.

The Reactor ingests data, curates Events and Resources, and exposes that cultivated knowledge for downstream processing. The following architecture diagram shows the basic components in a Connected Account (on the left) and The Reactor (on the right):

The Reactor comprises Lambda Functions, S3 Buckets, DynamoDB Tables, and SNS Topics. Sometimes API Gateway fronts Lambda Functions for invocation over HTTP (for our CLI). These AWS Services keep the monetary cost of running the Reactor low. Also, these services play very well together because they do not require a VPC, which can be very costly for Lambda performance.

Why Open Source The Reactor?

The Reactor is an Open Source project for three reasons.

We want you to know there is “No Funny Stuff” in The Reactor. We will soon be launching a hosted version of The Reactor, which will be the same code you see in the GitHub repository. Transparency is a core value at CloudZero, both in our culture and our code.
Our team stumbled into some interesting patterns using some newer AWS services like Organizations, Athena, and Lambda. We will cover some of those below, and we will continue to share our learning via this blog and code in the coming months.
We need your feedback. Is The Reactor useful? Are our tools usable? Please yell loudly if you have feedback.

What has The Reactor done for me Lately?

The Reactor provides two tangible benefits right away.

SimpleQuery

The CloudZero CLI connects to one or more CloudZero Reactors and provides, among other useful functions, cz query --interactive, which will drop you pleasantly onto a REPL pillow bed for executing SQL queries against normalized CloudTrail data.

There are some great blog posts about Athena: Using Athena to Query CloudTrail, Using Athena to Query S3 for CloudTrail, and Analyzing VPC Flow Logs with Athena. The Reactor automates setting up the necessary trappings: the source and results S3 Buckets and Athena metadata. Additionally, The Reactor automatically groups multiple AWS Accounts for cross-account querying!

The Reactor and cz query CLI together provide a pleasurable experience for querying your AWS cloud data. For example, getting all recent events triggered by a CLI session:

SimpleQuery demo.

Recorded by cztank

asciinema.org

Code Examples/Patterns

The Reactor displays some useful Lambda, Athena, and Organizations and STS patterns. For example, AWS Lambda Best Practices suggests using environment variables for loading settings into your functions; however, anyone who has used os.environ knows the perils of testing side-effecting code. We use composable decorators to solve this problem. We will walk through this code in more detail in a future blog post.

We mentioned Athena briefly earlier when introducing SimpleQuery. Athena is a relatively new service that enables querying data directly out of S3. The AWS docs state to use Athena when querying unstructured, semi-structured, and structured data — which … um … we guess means we should be using it all the time; in all seriousness, it is great for querying semi-structured event data like CloudTrail. However, there are definite pitfalls and documentations holes, especially the Hive DDL and supported compression algorithms (hint: use the correct file extension). The Presto SQL engine docs are great but can be confusing when comparing data types between Presto and Hive. We hope the code illuminates what we’ve learned here as well.

Are you using AWS Organizations? If not, do so immediately. Even for your personal account — nay, especially for your personal account. This is a no financial cost way of reducing risk, improving cohesion, and automating sandboxes. For example, want to test out The Reactor? Simply turn on AWS Organizations and create an empty sub-account, deploy The Reactor and connect the new sub-account. You now have a sandboxed POC that you can easily tear-down by deleting the entire sub-account.