תגית: Big Data

AWS Aurora vs. AWS RDS MySQL,12 Criteria Power Check-List

Firstly take a look at this recent AWS April 2016 Webinar Series – Migrating your Databases to Aurora and the  AWS June 2016 Webinar Series – Amazon Aurora Deep Dive – Optimizing Database Performance session lead by Puneet Agarwal

Here are the main reasons I would look into for choosing AWS Aurora over AWS RDS MySQL

  1. Faster recovery from instance failure (X5 times or more vs. MySQL)
  2. Consistent lower impact on the Primary replica
  3. Need additional throughput (theoretically X5 times for the same resources vs. MySQL). This was achieved by decoupling the cache and storage sub-systems and spreading them across many nodes as well as performing log commit first while DB manipulation is done asynchronously.
  4. Using or can migrate to MySQL 5.6
  5. Comfortable with the Aurora I/O mechanism (16K for read, 4K for write, all can be batched if smaller)
  6. Get more replicas (maximum of 15 vs. 5 in MySQL)
  7. Prioritise recovery replica targets and set replicas with different size than the master
  8. Need virtually no replication lag – since replica nodes share the same storage as the master uses
  9. Able to decide about encryption at rest, at DB creation
  10. Accept working with the InnoDB engine alone
  11. Want to eliminate the need for cache warming
  12. Allow additional 20% pricing over MySQL to gain all the above 🙂

Solved: DE-Centralized Puppet Asset Management – factbeat

Using a DE-Centralized (Master-Less) Puppet stack has its benefits for dynamic fast morphing environments.

Yet you'd still love to get all changes made to your environment recorded in a central repo.

Check out factbeat from Elasticsearch community. It's a beat that  ships Puppet Facter facts to Elasticsearch, where they can be stored, analyzed, displayed and compared over time.

Factor can be easily customized to ship new types of configuration information as your heart desires.

What are you using?

State of the 'Spark'

I got first hands on with Apache Spark about a year ago and it seemed cool. Yet going through my updated quick notes here, I felt falling in love with it 😎 It grew much more in integration options as well as features..

  1. The Zeppelin IDE checks for syntax errors syntax and shows you the data as well as lets you submit jobs to a spark cluster
  2. Scala is the default language, but can be used from python SQL and others
  3. Spark is Newer than Hadoop and positioned to replace it
  4. Spark Optimizes data shifting using memory mapping and reduces the move data across cluster nodes using partitions
  5. Runs on top of jvm
  6. Scala is based on Functional programming where you would use X = collection of Y filter by.. Instead ofFor loop in Y  If then add to X
  7. Spark uses RDD – Resilient Distributed Datasets: fault-tolerant collection of elements that can be operated on in parallelis and produce the data processing we want
  8. Spark supports many formats for the data: hive json cassandra Elasticsearch
  9. Spark can be used with Mlib for machine learning
  10. Spark Streaming allows data frames manipulations on the fly – letting you write streaming jobs the same way you write batch jobs. It supports Java, Scala and Python. 
  11. SparkR let's you interact with spark via R. Still not fully functional
  12. You can use those to submit Spark jobs: EMR step, Lambda, AWS pipeline, Airflow, Zeppelin, R Studio
  13. You can reduce cost and keep data off the cluster on S3 and by using emrfs as well
  14. In AWS you can hook Spark with DynamoDB RDS Kinesis and many others

IBM, Facebook and others unlock Threat data for the sake of humanity..

Check this out…


IBM and Facebook as well as others are starting to contribute to a massive big data based repository of threat related information.

I had an internal startup for some time that was targeting security as well as general operational data to point to trends that need attention such as disk series that are reaching failure points, apps that suddenly morph and such.

Another topic was cleansing the data from any personal or internal information by tokenizing it.

I stopped this startup since I got to meet someone who was doing the same and pointed to the fact that there's already enough data on one hand (and now per this post we have got much more of that) and on the other hand companies would agree to share cleansed data but would not be able to do it due to regulations that take time to defuse.

In any case you have now lots of data too sip through if you are a hungry Data scientist…

Cloud Computing and SAAS Secrets revealed – 7 Consumption Economics Rules

This is important to get, for any enterprise software producer – here is a summary of more than an hour of video nuggets going through the must-read "Consumption Economics: The New Rules of Tech" Book, by J.B. Wood, Todd Hewlin and Thomas Lah.

Your customers are approaching a consumption gap: due to shrinking revenue and resources they can't deploy your complex product fast enough to create the business value that justified paying for your costly products in the past, No matter how many powerful new features you offer. This makes the market ripe for price wars and disruption.

Customers cannot afford the long traditional time, required to deploy enterprise IT products.
Employees simply go directly to the SAAS based providers, ignoring any uneasy traditional solution.

There is lots of money to be made in SAAS – 200 million USD a year, but some will be extracted from existing solutions…

How should you, the solution provider move to the new paradigm without hurting your margins too fast?

The 7 new rules for Enterprise IT Software providers:

1. Risk is shifting from the customer to the service provider and from few lot of CAPEX to many small OPEX transactions. In the past the customer paid for IT solutions up front and owned the deployment risk, now it is the other way around.

2. Simplicity is KING. People prefer simpler partial good enough solutions, over full range complex solutions, the latter are expensive in both price and effort to deploy. The result is price war downhill in which non SAAS suppliers can't win.

3. Users can finally drive tech decisions, because those services exist without mass dependency on corporate IT, and users will prefer SAAS.

4. Customer aggregators shrink direct market. Rack Space, Amazon and others will aim to aggregate users who need your SAAS solutions…so you as the solution creator have less direct sales opportunities to customers…and the aggregators will force you to bring your price down, to match their pricing…this means you could be commoditized…

5. Channel value reset. Global system integration services such as IBM and Infosys could get a lot of value by aggregating customers, providing complete line of business solutions, assisting in blurring the lines across private and public cloud. Smaller VARs and OEMs are in danger and will need to reshape their workforce and services to live with smaller chunks of income and bring value by launching consumption based services, enabling business process change and shifting from resell to fee for service.

6. Tech pricing under pressure. You need a freemium / try for free model. You need to be able and sell at a very low price ($5 per user…) and the have upgrade option up to the highest level ($250 per user) instead of getting paid a huge total amount up-front.
Customers will want to move from the rigid CAPEX fixed price model, via OPEX subscription model, which does not reflect actual use of the service, yet is cheaper than CAPEX, and finally land onto the consumption based subscription, where users only pay for what they use or maybe for the outcome they get, over short periods of time. The latter will require both vendors and customers to adjust their monetary systems.

7. Behavioral data drives consumption: where you the supplier use data gathered on what your customers want as they use your service, to adjust and give more, leading to further revenue. Most SAAS providers focus only on acquiring customers and monetizing them, while neglecting engagement (where the service automatically offers the user with new ways of using the service) and viral expansion of the use of the service as well as the customer base.

You want to move from being a master of complexity to master of consumption, and widen your market reach.

We can see 3 main points here:

1. Customers will pay per use.
2. You have to focus on the end users, since those who are more successful as a result of using your services, will consume more, and hence pay you more.
3. The risk for value generation by using our services moves from your users to you, the provider.

The old definition of 'deployment done' meant the sale was made and initial setup of our software has been completed, setting our customer to pay us maintenance fee.
The SAAS definition of 'deployment done' should mean that the customer is massively using our service, benefiting of the complete suite of options.
This means you have to become your customer's companion, guide and assistant in a successful journey of their business, using our SAAS offering.

Everything in the software industry needs to morph to support consumption, that's based use of the software.
Marketing for example should allow automatic, real time, amazon like offering of additional services, that match what the user or other users like him seem to need.

Services though, will probably have the most fundamental affect. If we use statistics that assume the software business is worth 200 billion dollars, then the services business is 4 times bigger!
The services business addresses implementation, integration and maintenance. For many software vendors this sums up to 50% of their income! Moving to SAAS actually takes care of the management of complexity that software vendors have within their products. That's exactly what Cloud Computing and SAAS inherently resolves, because there is no effort over installation and integration, while software updates are expected to be delivered as part of the subscription model. The right way to defend the services golden egg is by transiting from complexity management to creation of real business value. So instead of advising on how to get our software going, you need to advise and assist our customers to get business value through the use of our products. We need to look into what our most successful customers are doing every day, with our software and assist others does that as well.
So we should get out of the mental focus on margins, cost centers and such, and become masters in value creation for our customers. This requires us to break down the inter-services barriers and silos (consulting, support, education and others).
During this process will need to change our own organizational structure and allow sharing of resources, information and knowledge. The most valuable skillets will revolve around consulting for business value creation, through the use of our products, from technical wizardry to business processes mastery. We could then package this together into managed services offerings.
We'll need expertise in areas such as vertical industry, business, design, product, technical, consumption.
The real driver for consumption and sales will be the services organization rather than sales or marketing, as they will be the ones providing consultancy, aimed to increase customer's business value through increased consumption of their SAAS offering, which is gained by providing customers with education, support, professional services, etc.

Cloud computing affects will not revolve around hosting, on premise or off premise; it will rather be mandated by the consumption of services.

You don't want to import the current structure of your legacy software into the new model. You have to transform every aspect of your business, including people, processes, and technology as well as belief systems.

Like Apple, you need to do both innovation in the relevant areas, and also to serve your customers so they are successful.

To facilitate this transition you can't force it into people's mind, but you need to keep asking the right questions…

GENESIS - Big Data Super Nova - Part One (Eve)

Big Data Super Nova Book on Kindle for FREE (next 23 hours)

I just published my first Novella on Amazon and would like to share it with you.

If you go to this link: http://amzn.to/17mLBi4 you can get it for free on your Kindle (in the next 24 hours or so)

*** I'd really appreciate if You Could Rate it and Write a Comment ***

It is called "GENESIS – Big Data Super Nova" (40 min read) – insights and ideas on the future of Big Data, wrapped in a Sci-Fi tech Novella:

*The Death of passwords
*Data dissolving agents
*The new security paradigm and the merge of humans and computers
*Inter Body Nano Bots, Brain Dumps, Light Speed Travel
*And more…