Dockerzing Docker

If you feel overwhelmed by the breakdown of technologies Docker is built on, here is a cheat list to ease the pain 🙂
Just take a look at the new structure of the Docker platform. Many of its components are now offered as generalized components any one can use to build a new container framework and yet are used to build any new release of Docker open source free tools, as well as enterprise paid products.

This new structure is part of the OCI – Open Container Initiative driven projects.

Users of Docker tools should not experience any change in their work flows, yet system builders now have common stardard hooks they can use to stack their solutions into Docker and other container based frameworks.

  1. Moby is a standard framework for system builders to create customized containers based on Docker or other engines. Moby container images are called Assemblies and they usually contain a specific set of components such Infrakit, Linuxkit, Containerd, JDK, Java App.
  2. InfraKit is a toolkit for creating and managing self-healing infrastructure. InfraKit is designed to support setup and management of base infrastructure. For example, it can help you manage a system like a cluster or container orchestrator
  3. LinuxKit, a toolkit for building custom minimal, immutable Linux distributions. Linuxkit is a hardened minimized Linux image as the basis for building container images – based on minimized read-only Alpine Linux that is cryptographically  verified and used for the initialization of a container. Linuxkit include a timer counter that triggers the refresh of your container image so you always run the latest most secure baseline and also reverse any changes an attacker may have caused to your container
  4. Containerd is the open source generalized replacement for dockerd daemon. It takes care of image retrieval, network name spaces, launching runC. Containerd includes a daemon exposing gRPC API over a local UNIX socket – much more robust that the REST API previous versions of Dockerd daemon was using.
  5. RunC is a CLI that activates the actual container engine required for our image: Docker, Rkt or others
  6. Notary is the mechanism that signs an verifies cryptographically the images in its registry.
  7. SwarmKit is a toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more. Swarmkit takes care of cluster maintenance including rotation of certificates.

Serverless and AWS Lambda Tips and Tools

  1. Use services that allow integration of feature flags through out your application to dynamically test, activate or suspend features that (some) your users should be using. Here is one such service: https://github.com/launchdarkly/featureflags/blob/master/README.md
  2. Track your external libraries through services that can alert of issues or vulnerabilities in those libraries – here is one such service called “Synk” – https://serverless.com/blog/4-ways-to-secure-prevent-vulnerabilities-in-serverless-applications/
  3. Store all your external libraries as a local copy in your internal repositories so that your are not affected by mistakes or vulnerabilities that affect the public code repositories ( such as NPM public repository for java – see details here on how it badly affected many applications https://www.theregister.co.uk/AMP/2016/03/23/npm_left_pad_chaos/ )
  4. Using a bigger memory tier could cause allocation of better CPU allocation which can bring your transaction processing speed from seconds to sub seconds
  5. Monitor performance metrics of your application to determine when it changed and why
  6. You must monitor for errors in your code. Don’t assume its working well
  7. Using Lambda inside VPC requires attention to security groups same as for an EC2 instance
  8. Make sure your Lambda function has the least privileges required in its IAM policy
  9. AWS Toolkit for Eclipse: Support for Creating Maven Projects for AWS, Lambda, and Serverless Applications http://bit.ly/2muxucL

Ansible Tower 3.1 brings Workflows Log integration and Clustering

Ansible Tower 3.1 brings the new “Workflows” feature allowing you to hook Playbooks, set conditions in executing them and passing data from one Playbook to another.

Additionally Tower can now scale beyond a single instance allowing job processing through any one of the tower nodes in the cluster.

In Tower 3.1 you can easily direct all logs to a central log service such as ELK, Splunk, loggly or others.

More information here: https://www.ansible.com/blog/introducing-asible-tower-3-1

AWS Athena says No so beautifully

Amazon AWS Athena allows you run ANSI SQL directly against your S3 Buckets supporting a multitude of file formats and data formats

Here are my insights taken from a comprehensive YouTube session lead by Abhishek Sinha

  • No ETL needed
  • No Servers or instances
  • No warmup required
  • No data load before querying
  • No need for DRP – it’s multi AZ

Uses Presto (in memory data distributed data query engine) and HIVE (DDL table creation to reference to your S3 data)
You pay for the amount of data scanned, so you can optimize the performance as well as cost, if you:

  1. Compress your data
  2. Store it in a columned format
  3. Partition it
  4. Convert it to Parquet / ORC format

Querying in Athena:

  1. You can query Athena via the AWS Console (dozens of queries can run in parallel) or using any JDBC enabled tool such as SQL Workbench
  2. You can stream Athena queries results into S3 or AWS Quick Sight (Spice)
  3. Creating a table for query in Athena is merely writing a schema that you later refer to
  4. Table Schema you create for queries are fully managed and Highly Available
  5. Queries will act as the route to the data so every time you execute the Query it re-evaluates everything in the relevant buckets
  6. To create a partition you specify a key value and then a bucket and a prefix that points to the data that correlates with this partition

Just note that Athena serves specific use cases (such as non urgent ad-hoc queries) where other Big Data tools are used to fulfill other needs – AWS Redshift is more aimed at quickest query times for large amounts of unstructured data, where AWS Kinesis Analytics is aimed at queries of rapidly streaming data.

Want to learn more on Big Data and AWS? Visit http://allcloud.io