Why Immutable, Declerative Infrastructure is so cool (Terraform, Docker, Packer, Ansible)

Reading through  Why we use Terraform and not Chef, Puppet, Ansible, SaltStack, or CloudFormation, you should see why immutable code is so powerful.

But I would not drop Ansible altogether…

Here are some rules to look into:

  1. Build it immutable – cause you can the scale easily, recover easily and have a consistent source for testing and deploying what you actually tested
  2. Use Terraform to create immutable infrastructure setup
  3. Use Packer to create images that can be deployed anywhere – AWS, GCE, Vagrant, Openstack
  4. Use Ansible to script changes on top of your images if needed. Ansible is not immutable by itself, but allows a cleaner reusable baseline to replace your scattered scripts
  5. In Ansible use modules before you script, and Roles before you duplicate earlier effort. Playbooks are your scripts replacement.

More info:

https://Using Packer and Ansible to Build Immutable Infrastructure

Lessons from using Ansible exclusively for 2 years

Sweet And Cool! Kubernetes 1.4: Making it easy to run on Kubernetes anywhere

Sweet new features!

  1. Clusters can span regions and cloud providers (AWS, GCP)
  2. Setup of clusters in just 3 commands including the overlay (internal) network
  3. Simple Kubernetes Installation using simple yum/apt-get or simply run from the GCP platform (soon)
  4. Single VIP points to a load balancer that can forward traffic to federated cluster nodes across regions

More info:


AWS Aurora vs. AWS RDS MySQL – 12 Criteria Power Check-List

AWS Aurora vs. AWS RDS MySQL,12 Criteria Power Check-List

Firstly take a look at this recent AWS April 2016 Webinar Series – Migrating your Databases to Aurora and the  AWS June 2016 Webinar Series – Amazon Aurora Deep Dive – Optimizing Database Performance session lead by Puneet Agarwal

Here are the main reasons I would look into for choosing AWS Aurora over AWS RDS MySQL

  1. Faster recovery from instance failure (X5 times or more vs. MySQL)
  2. Consistent lower impact on the Primary replica
  3. Need additional throughput (theoretically X5 times for the same resources vs. MySQL). This was achieved by decoupling the cache and storage sub-systems and spreading them across many nodes as well as performing log commit first while DB manipulation is done asynchronously.
  4. Using or can migrate to MySQL 5.6
  5. Comfortable with the Aurora I/O mechanism (16K for read, 4K for write, all can be batched if smaller)
  6. Get more replicas (maximum of 15 vs. 5 in MySQL)
  7. Prioritise recovery replica targets and set replicas with different size than the master
  8. Need virtually no replication lag – since replica nodes share the same storage as the master uses
  9. Able to decide about encryption at rest, at DB creation
  10. Accept working with the InnoDB engine alone
  11. Want to eliminate the need for cache warming
  12. Allow additional 20% pricing over MySQL to gain all the above🙂

Solved: DE-Centralized Puppet Asset Management – factbeat

Using a DE-Centralized (Master-Less) Puppet stack has its benefits for dynamic fast morphing environments.

Yet you’d still love to get all changes made to your environment recorded in a central repo.

Check out factbeat from Elasticsearch community. It’s a beat that  ships Puppet Facter facts to Elasticsearch, where they can be stored, analyzed, displayed and compared over time.

Factor can be easily customized to ship new types of configuration information as your heart desires.

What are you using?

Open Source Security Validation Plug-in [Cool!]

WhiteSource’s New Selection Tool Helps Developers Choose Better Open Source Components


State of the ‘Spark’

I got first hands on with Apache Spark about a year ago and it seemed cool. Yet going through my updated quick notes here, I felt falling in love with it 😎 It grew much more in integration options as well as features..

  1. The Zeppelin IDE checks for syntax errors syntax and shows you the data as well as lets you submit jobs to a spark cluster
  2. Scala is the default language, but can be used from python SQL and others
  3. Spark is Newer than Hadoop and positioned to replace it
  4. Spark Optimizes data shifting using memory mapping and reduces the move data across cluster nodes using partitions
  5. Runs on top of jvm
  6. Scala is based on Functional programming where you would use X = collection of Y filter by.. Instead ofFor loop in Y  If then add to X
  7. Spark uses RDD – Resilient Distributed Datasets: fault-tolerant collection of elements that can be operated on in parallelis and produce the data processing we want
  8. Spark supports many formats for the data: hive json cassandra Elasticsearch
  9. Spark can be used with Mlib for machine learning
  10. Spark Streaming allows data frames manipulations on the fly – letting you write streaming jobs the same way you write batch jobs. It supports Java, Scala and Python. 
  11. SparkR let’s you interact with spark via R. Still not fully functional
  12. You can use those to submit Spark jobs: EMR step, Lambda, AWS pipeline, Airflow, Zeppelin, R Studio
  13. You can reduce cost and keep data off the cluster on S3 and by using emrfs as well
  14. In AWS you can hook Spark with DynamoDB RDS Kinesis and many others