AWS Athena says No so beautifully

Amazon AWS Athena allows you run ANSI SQL directly against your S3 Buckets supporting a multitude of file formats and data formats

Here are my insights taken from a comprehensive YouTube session lead by Abhishek Sinha

  • No ETL needed
  • No Servers or instances
  • No warmup required
  • No data load before querying
  • No need for DRP – it’s multi AZ

Uses Presto (in memory data distributed data query engine) and HIVE (DDL table creation to reference to your S3 data)
You pay for the amount of data scanned, so you can optimize the performance as well as cost, if you:

  1. Compress your data
  2. Store it in a columned format
  3. Partition it
  4. Convert it to Parquet / ORC format

Querying in Athena:

  1. You can query Athena via the AWS Console (dozens of queries can run in parallel) or using any JDBC enabled tool such as SQL Workbench
  2. You can stream Athena queries results into S3 or AWS Quick Sight (Spice)
  3. Creating a table for query in Athena is merely writing a schema that you later refer to
  4. Table Schema you create for queries are fully managed and Highly Available
  5. Queries will act as the route to the data so every time you execute the Query it re-evaluates everything in the relevant buckets
  6. To create a partition you specify a key value and then a bucket and a prefix that points to the data that correlates with this partition

Just note that Athena serves specific use cases (such as non urgent ad-hoc queries) where other Big Data tools are used to fulfill other needs – AWS Redshift is more aimed at quickest query times for large amounts of unstructured data, where AWS Kinesis Analytics is aimed at queries of rapidly streaming data.

Want to learn more on Big Data and AWS? Visit http://allcloud.io

Kubernetes- making it Highly Available

You can set a Highly Available Kubernetes cluster  by adding worker node pools and master replicas.

That’s true as of Kubernetes version 1.5.2. It is supported using the kube-up/kube-down scripts for GCE (as alpha): http://blog.kubernetes.io/2017/02/highly-available-kubernetes-clusters.html?m=1

For AWS you have support for HA Kubernetes cluster using KOPS scripts:

http://kubecloud.io/setup-ha-k8s-kops/

GCP Big Table – main facts

GCP Big Table – main facts:

Is the basis of many google products
Object storage system

Does not offer indexes except for a single range index you can use

Is the basis for Hadoop big data system

You pay for storage separately

You pay for min 3 nodes and can expand as you need

Nodes are needed just for read / write – not for storage

Support for massive amounts of reads / writes but not locking or transaction support

Is not completely and highly available since sometimes data is not available as it is moved around

Great for big queries, less for short quick rapid ones

https://cloud.google.com/bigtable/

Why managers are pushed away by Talents and Leaders

Here are the main points in this brilliant session by Ade McCormack on the new age of workplace, employment and skills

Managers were needed in the Industrial Age to keep an eye on employees that did not like the automatic tasks they had to do
Now automation took over those jobs

There is no room for laziness

The need is for talented people and leaders

Those are passionate people who are eager to do much more than any manager could demand

Talents look for innovation, mobility/flexibility/fun-ability , work/life balance, playing with other great people – all those so much worthy than money, yet so hard to create

When you create a great workplace, great people will join, driving great customers to you as they provide massive value

Nowadays risk is hooked to value

You have to bring risk into your work plans to make real progress

As you play in risky arenas you need some peripheral sensory – data about new risks

You need to spend time trying out the new risk related technologies even if there is no guarantee they will materialize into actual danger

Leaders must make sure their teams don’t have any interference to gain and maintain a state of “flow” – joyful focus on exercising their maximum abilities

Treat your career as a lean startup – choose what you are passionate about, have great skills at, and got market demand for

Ask yourself everyday- am I working in a place that allows me to gain my maximum market value

How hard do you practice and learn to become world class in your arena?