I have seen a peered Mongo!https://www.mongodb.com/blog/post/introducing-vpc-peering-for-mongodb-atlas
Reading through Why we use Terraform and not Chef, Puppet, Ansible, SaltStack, or CloudFormation, you should see why immutable code is so powerful.
But I would not drop Ansible altogether…
Here are some rules to look into:
- Build it immutable – cause you can the scale easily, recover easily and have a consistent source for testing and deploying what you actually tested
- Use Terraform to create immutable infrastructure setup
- Use Packer to create images that can be deployed anywhere – AWS, GCE, Vagrant, Openstack
- Use Ansible to script changes on top of your images if needed. Ansible is not immutable by itself, but allows a cleaner reusable baseline to replace your scattered scripts
- In Ansible use modules before you script, and Roles before you duplicate earlier effort. Playbooks are your scripts replacement.
Sweet new features!
- Clusters can span regions and cloud providers (AWS, GCP)
- Setup of clusters in just 3 commands including the overlay (internal) network
- Simple Kubernetes Installation using simple yum/apt-get or simply run from the GCP platform (soon)
- Single VIP points to a load balancer that can forward traffic to federated cluster nodes across regions
Firstly take a look at this recent AWS April 2016 Webinar Series – Migrating your Databases to Aurora and the AWS June 2016 Webinar Series – Amazon Aurora Deep Dive – Optimizing Database Performance session lead by Puneet Agarwal
- Faster recovery from instance failure (X5 times or more vs. MySQL)
- Consistent lower impact on the Primary replica
- Need additional throughput (theoretically X5 times for the same resources vs. MySQL). This was achieved by decoupling the cache and storage sub-systems and spreading them across many nodes as well as performing log commit first while DB manipulation is done asynchronously.
- Using or can migrate to MySQL 5.6
- Comfortable with the Aurora I/O mechanism (16K for read, 4K for write, all can be batched if smaller)
- Get more replicas (maximum of 15 vs. 5 in MySQL)
- Prioritise recovery replica targets and set replicas with different size than the master
- Need virtually no replication lag – since replica nodes share the same storage as the master uses
- Able to decide about encryption at rest, at DB creation
- Accept working with the InnoDB engine alone
- Want to eliminate the need for cache warming
- Allow additional 20% pricing over MySQL to gain all the above🙂
Using a DE-Centralized (Master-Less) Puppet stack has its benefits for dynamic fast morphing environments.
Yet you’d still love to get all changes made to your environment recorded in a central repo.
Factor can be easily customized to ship new types of configuration information as your heart desires.
What are you using?
I got first hands on with Apache Spark about a year ago and it seemed cool. Yet going through my updated quick notes here, I felt falling in love with it 😎 It grew much more in integration options as well as features..
- The Zeppelin IDE checks for syntax errors syntax and shows you the data as well as lets you submit jobs to a spark cluster
- Scala is the default language, but can be used from python SQL and others
- Spark is Newer than Hadoop and positioned to replace it
- Spark Optimizes data shifting using memory mapping and reduces the move data across cluster nodes using partitions
- Runs on top of jvm
- Scala is based on Functional programming where you would use X = collection of Y filter by.. Instead ofFor loop in Y If then add to X
- Spark uses RDD – Resilient Distributed Datasets: fault-tolerant collection of elements that can be operated on in parallelis and produce the data processing we want
- Spark supports many formats for the data: hive json cassandra Elasticsearch
- Spark can be used with Mlib for machine learning
- Spark Streaming allows data frames manipulations on the fly – letting you write streaming jobs the same way you write batch jobs. It supports Java, Scala and Python.
- SparkR let’s you interact with spark via R. Still not fully functional
- You can use those to submit Spark jobs: EMR step, Lambda, AWS pipeline, Airflow, Zeppelin, R Studio
- You can reduce cost and keep data off the cluster on S3 and by using emrfs as well
- In AWS you can hook Spark with DynamoDB RDS Kinesis and many others