AWS Aurora vs. AWS RDS MySQL – 12 Criteria Power Check-List

AWS Aurora vs. AWS RDS MySQL,12 Criteria Power Check-List

Firstly take a look at this recent AWS April 2016 Webinar Series – Migrating your Databases to Aurora and the  AWS June 2016 Webinar Series – Amazon Aurora Deep Dive – Optimizing Database Performance session lead by Puneet Agarwal

Here are the main reasons I would look into for choosing AWS Aurora over AWS RDS MySQL

  1. Faster recovery from instance failure (X5 times or more vs. MySQL)
  2. Consistent lower impact on the Primary replica
  3. Need additional throughput (theoretically X5 times for the same resources vs. MySQL). This was achieved by decoupling the cache and storage sub-systems and spreading them across many nodes as well as performing log commit first while DB manipulation is done asynchronously.
  4. Using or can migrate to MySQL 5.6
  5. Comfortable with the Aurora I/O mechanism (16K for read, 4K for write, all can be batched if smaller)
  6. Get more replicas (maximum of 15 vs. 5 in MySQL)
  7. Prioritise recovery replica targets and set replicas with different size than the master
  8. Need virtually no replication lag – since replica nodes share the same storage as the master uses
  9. Able to decide about encryption at rest, at DB creation
  10. Accept working with the InnoDB engine alone
  11. Want to eliminate the need for cache warming
  12. Allow additional 20% pricing over MySQL to gain all the above🙂

Solved: DE-Centralized Puppet Asset Management – factbeat

Using a DE-Centralized (Master-Less) Puppet stack has its benefits for dynamic fast morphing environments.

Yet you’d still love to get all changes made to your environment recorded in a central repo.

Check out factbeat from Elasticsearch community. It’s a beat that  ships Puppet Facter facts to Elasticsearch, where they can be stored, analyzed, displayed and compared over time.

Factor can be easily customized to ship new types of configuration information as your heart desires.

What are you using?

Open Source Security Validation Plug-in [Cool!]

WhiteSource’s New Selection Tool Helps Developers Choose Better Open Source Components


State of the ‘Spark’

I got first hands on with Apache Spark about a year ago and it seemed cool. Yet going through my updated quick notes here, I felt falling in love with it 😎 It grew much more in integration options as well as features..

  1. The Zeppelin IDE checks for syntax errors syntax and shows you the data as well as lets you submit jobs to a spark cluster
  2. Scala is the default language, but can be used from python SQL and others
  3. Spark is Newer than Hadoop and positioned to replace it
  4. Spark Optimizes data shifting using memory mapping and reduces the move data across cluster nodes using partitions
  5. Runs on top of jvm
  6. Scala is based on Functional programming where you would use X = collection of Y filter by.. Instead ofFor loop in Y  If then add to X
  7. Spark uses RDD – Resilient Distributed Datasets: fault-tolerant collection of elements that can be operated on in parallelis and produce the data processing we want
  8. Spark supports many formats for the data: hive json cassandra Elasticsearch
  9. Spark can be used with Mlib for machine learning
  10. Spark Streaming allows data frames manipulations on the fly – letting you write streaming jobs the same way you write batch jobs. It supports Java, Scala and Python. 
  11. SparkR let’s you interact with spark via R. Still not fully functional
  12. You can use those to submit Spark jobs: EMR step, Lambda, AWS pipeline, Airflow, Zeppelin, R Studio
  13. You can reduce cost and keep data off the cluster on S3 and by using emrfs as well
  14. In AWS you can hook Spark with DynamoDB RDS Kinesis and many others
Amazon SQS

AWS IAM Nirvana: AWS API Gateway hooks with AWS Lambda and IAM

What I liked:

  1. Complete walk through AWS Cognito flow: Authorise with DynamoDB, ask Cognito for token and ID, ask Cognito for AWS credentials derived from an AWS role that lasts an hour.
  2. Independent IAM checkup that is run by AWS: The APP tries to act without running any authorisation logic in itself. The authorization logic is embedded in the AWS policy on each resource the APP tries to reach and cross checked with the AWS role the session got via Cognito.
  3. Strong anti data theft system: The access policy allows access to items only if the item’s user ID attribute matches the current Cognito session ID. So user A cannot dump details of user B…
  4. AWS API Gateway can automatically create a client SDK based on your app for many program languages.
  5. You can hide the fact your SDK uses AWS credentials.
  6. Integration with as a source repository for API document spec.

AWS Lambda 5 cool features

What I liked:

  1. Versions and aliases (prod as an alias can point to the active function)
  2. Scheduling of actions
  3. Support for Python and others
  4. Dynamic – No need to setup servers
  5. VPC support – can communicate with other services you have internally
  6. Integration with CloudWatch (inspect and Analyze incoming log entries)