Konrad Scherer
THURSDAY, 10 MAY 2018

Getting started with ElasticSearch

Introduction

I manage an internal build system that creates a simple text file with key value pairs with build statistics. These statistics are then processed using a fairly gnarly shell script. When I first saw this years ago I thought it looked like the perfect candidate to use ElasticSearch and finally had time to look into this.

ElasticSearch and Kibana

ES is a text database and search engine which is useful, but it also has a neat frontend called Kibana which can be used to query and visualize the data. Since I manage the system, there was no need to setup Logstash to preprocess the data since I could just convert it to json myself.

Official Docker Images

The documentation for ElasticSearch covers installation using Docker, but there is one gotcha. The webpage that lists all the available docker images at https://www.docker.elastic.co/ only lists the images that contain the starter X-Pack with a 30 day trial license. I ended up using the docker.elastic.co/elasticsearch/elasticsearch-oss image which is only Open Source content. Same for the Kibana image.

Docker-compose

I wanted to run ES and Kibana on the same server, but if you do that using two separate docker run commands, the auto configuration of Kibana doesn’t work. I also wanted to make a local volume to hold the data and so I created a simple docker-compose file:

---
version: '2'
services:
  kibana:
    image: docker.elastic.co/kibana/kibana-oss:6.2.4
    environment:
      SERVER_NAME: $HOSTNAME
      ELASTICSEARCH_URL: http://elasticsearch:9200
    ports:
      - 5601:5601
    networks:
      - esnet

  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.2.4
    container_name: elasticsearch
    environment:
      discovery.type: single-node
      bootstrap.memory_lock: "true"
      ES_JAVA_OPTS: "-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    ports:
      - 9200:9200
    volumes:
      - esdata1:/usr/share/elasticsearch/data
    networks:
      - esnet

volumes:
  esdata1:
  driver: local

networks:
  esnet:

Now I have ES and Kibana running on ports 5601 and 9200.

JSON output from Bash

I have a large collection of files in the form:

key1: value1
key2: value2
<etc>

Converting this to JSON should be simple, but there were a few surprises:

  • JSON does not like single quotes so wrapping the key and value in quotes requires either using echo and " or printf which I found cleaner.
  • JSON requires that the last element does not have a trailing comma. I abused bash control characters by using backspace to erase the last comma.
  • ElasticSearch would fail to parse if the JSON contained ( or ). I used tr to delete all backslashes from the JSON.

The final code looks like this:

convert_to_json()
{
    local ARR=();
    local FILE=$1
    while read -r LINE
    do
        ARR+=( "${LINE%%:*}" "${LINE##*: }" )
    done < "$FILE"

    local LEN=${#ARR[@]}
    echo "{"
    for (( i=0; i<LEN; i+=2 ))
    do
        printf '  "%s": "%s",' "${ARR[i]}" "${ARR[i+1]}"
    done
    printf "\b \n}\n"
}

for FILE in "$@"; do
    local JSON=
    JSON=$(convert_to_json "$FILE" | tr -d '\\' )
    echo $JSON
done

ElasticSearch type mapping

The data was being imported into ES properly and I tried to search and visualize the data, but I found it really hard to visualize the data. Every field was imported as a text and keyword type which meant that all the date and number fields could not be visualized as expected.

The solution was to create a mapping which assigns types to each field in the document. If the numbers had not been sent as strings, ES would have converted them automatically, but I had dates in epoch seconds which is indistinguishable from a large number. Date parsing is its own challenge and ES supports many different date formats. In my specific case, epoch_seconds was the only date format required.

I took the default mapping and added the type information to each field.I tried to apply this mapping to the existing document, but ES does not allow the mapping of a field to be changed because that would change the interpretation of the data. The solution is to create a new index and reindex the old index to the new one with types. This worked and I was now able to visualize the data much more easily.

Curator

I now had one index and it was growing quickly. I remembered from previous research that LogStash uses indexes with a date suffix. This allows data to be cleaned up regularly and also allows a new mapping to applied to new indexes. Creating and deleting indexes is handled by the Curator tool.

I created two scripts: one for deleting indexes that are 45 days old and another for creating tomorrows index with the specified mapping. Running these from cron daily will automate the creation and cleanup. Last piece is to have the JSON sent to the index that matches the day.

Conclusion

Every new piece of software has its learning curve. So far the curve has been quite reasonable for ElasticSearch. I look forward to working with it more.




TUESDAY, 23 MAY 2017

Thoughts on Exercise

Introduction

Three years ago, I read ‘Body by Science’ and it changed the way I think about exercise completely. Unfortunately I wasn’t able to find a gym in Ottawa that used this type of training. I then asked Google for results on ‘HIT bodyweight’ and found Drew Baye and Project Kratos. After reading most of Drew Baye’s blog and watching any videos I could find with him, I purchased Project Kratos and started my experiment.

Doing a workout once a week at home was perfect for my life situation at the time: two young children and full time jobs for my wife and I. Despite the infrequency I made progress surprisingly quickly. The squat, heel raise and back extension exercises almost tripled in under six months. But I quickly plateaued on exercises like the pushup, chinup and crunch. I tried many different things: negatives, forced reps, more rest, less rest, split routines, etc. but nothing allowed me to break through the plateau.

I started experimenting with different programs like GMB and GB, but all I lacked the mobility to do even their entry movements. I also read ‘Supple Leopard’ by Kelly Starret and ‘Roll Model’ by Jill Miller and realized that mobility was probably my limiting factor. It took a while before I was able to add my reading about mobility into my exercise mental model. Here is my current exercise model that I used to setup my latest exercise routine and goals.

Each movement has three components: mobility, skill and strength. These three components are related is non-linear ways. Even if they cannot be separated, it can still be useful to think about the effect of each component on a movement:

Mobility

Do the joints, connective tissue and muscles that participate in the movement have the range of motion required? There are three answers to this question: yes, no and almost.

  • No means the joints do not have the required range of motion. For example I cannot do a single leg squat because I lack the ankle range of motion necessary.
  • Yes means the range of motion is sufficient for the movement.
  • Almost is the most tricky condition because it can look like the movement can be done, but it is not optimal.

Examples of “almost” include missing shoulder range of motion so that the lats are not properly engaged for chinups or pullups. When only the arm muscles are used, the number of reps will always be limited and risk of shoulder injury is higher.

Another example is doing a squat without full ankle or hip range of motion. The squat is possible but it may lead to a rounded back and that can lead to joint wear and injury.

There are many examples of world class athletes that are still able to excel even with mobility limitation but they often pay the price for it later. If mobility restrictions are not addressed, injuries and plateaus will keep happening. More complicated movements often require more range of motion but developing range of motion takes much longer than building skill or strength.

Mobility is more than stretching. Increasing the range of motion requires convincing the nervous system that extending further is safe. This requires that the muscle be strong enough and lots of time being relaxed at the end of the range of motion. The fascia are also involved. Sometimes there can be adhesion of fascia layers that prevent proper movement. I have experienced many times injuries and pains resolved using a lacrosse ball or ART where force is applied to a trigger point.

Skill

Skill is the neurological coordination required to do a movement efficiently. Some movements require little practice, some full body movements require lots of practice to coordinate all the muscles and parts of the body properly. This is why practicing a movement without going to failure can still result in extra repetitions or apparent strength gains.

Full muscle contraction is another skill that is an important part of HIT. Learning to contract a muscle or a set of muscles under intense discomfort takes practice.

Strength

How best to simulate the body to produce the desired adaptation response of greater power output and/or muscle size? This is the component that HIT has focused on. The slow movement to momentary muscular failure protocol works well, but there are limitations. If a mobility or skill component is lacking, this will prevent a trainee from achieving proper muscular failure and simulating an adaptation response.

Training for all components

The best way to training for strength is HIT: short, infrequent movement to momentary muscular failure.

Skill training is best done when the muscles are rested as many skills require strength to perform properly. Skill training therefore works best with many repetitions using a lighter load with careful focus on form.

Mobility training is very different from skill or strength training. It takes lots of time and is specific to each individual body. It works best when done daily and integrated into other daily activities. I have had to find creative ways to combine a usual activity with stretching or working on fascia. For instance I will read in straddle stretch and meditate in squat position.

Programming

How best to combine all this information into a weekly program? That depends on the goals and time available of course.

If the goal is maximum ROI for minimum time investment, HIT strength training has the best returns. Movements that do not require skill or mobility components will have best return and this is why many HIT gyms use machines. Bodyweight HIT works but many of the movements have a mobility and skill component and each individual will experience different limitations.

The next step is to add mobility and skill practice. Unfortunately both these require significantly more time investment. Choose a specific skill and do daily mobility and skill repetitions.

For example I choose L-Sit and Crow Pose as skills and I do daily shoulder and wrist mobility followed by those skills on days when I do not do HIT strength. The mobility work takes 15-20 minutes. The skill work only takes a few minutes. I try to fit some more skill work in during the day to maximize the repetitions.

I have carved out a time for this every day. I will keep going while there is progress and then switch to something else when I stop progressing.

Conclusion

Thanks for reading this far. The key to staying motivated is progress and the key to progress is focusing on very specific goals.




TUESDAY, 9 MAY 2017

Hashicorp Vault based PKI

Introduction

One of the trends I have noticed is that open source tools encrypt network connections by default. Some tools like Puppet even make it impossible to disable TLS encryption and provide tooling to build an internal Certificate Authority. Docker requires overrides for any registry that does not have a verified TLS cert. Many tools also generate self signed certs which Firefox and Chrome always require manual overrides to use.

The solution is to have an internal Certificate Authority with the root CA as part of the trusted store. This internal CA can then be used to generate certs which will be trusted. But there are always complications. Many programs do not use the OS trusted store and require extra configuration to add trusted certs. For example Java applications require several steps to generate a new trusted store file and configuration to make that available to the application. Docker has a special directory to place trusted certs for registries.

Options

There are many CA solutions available: OpenCA, CertStrap, CFSSL, Lemur and many others. As I looked through all these programs a couple things kept bugging me. Creating certs is easy, revocation is where it gets really messy. The critical question is how to handle revocation in a sensible way. How can the system recover from a root CA compromise? Once I started reading about CRLs and OSCP and cert stapling, I got really discouraged. That is why I was intrigued by Hashicorp Vault and its PKI backend.

Vault

Vault is a tool for managing secrets of all kinds, including tokens, passwords and private TLS keys. It is quite complex and the CLI is non obvious. It supports backends for Authentication, Secret Storage and Auditing. It has a comprehensive access control language and a generic wrapper concept that makes it possible to pass secrets without revealing secrets to the middle man.

Vault solves the revocation and CA compromise problem by making it unnecessary. It provides a secure audited out of band channel for distributing secrets like certs which enables very short lived certs and secure automated reissuing of certs.

Vault PKI

That is the theory, so I decided to try it in practice by creating a CA and some certs.

1) Start Vault server, initialize, unseal and authenticate as root:

> vault server -config config.hcl
> export VAULT_ADDR='http://127.0.0.1:8200'
> vault init -key-shares=1 -key-threshold=1
Unseal Key 1: LbOw129fyB3OAzZvxq9RMQefNH8fFm7twS3wlg5Zv2o=
Initial Root Token: d9e9d69b-5d49-e753-3ef2-e6b36c0fb45a
> vault unseal LbOw129fyB3OAzZvxq9RMQefNH8fFm7twS3wlg5Zv2o=
> vault auth
Token (will be hidden): d9e9d69b-5d49-e753-3ef2-e6b36c0fb45a
Successfully authenticated! You are now logged in.

Of course this is for development only. A production deployment would use more shares and higher threshold. The unseal keys should be encrypted using gpg. Note that the root token can be changed.

2) Create self signed Cert with 10 year expiration

> vault mount -path=wrlinux -description="WRLinux Root CA" -max-lease-ttl=87600h pki
Successfully mounted 'pki' at wrlinux'!
> vault write wrlinux/root/generate/internal common_name="WRlinux Root CA" \
    ttl=87600h key_bits=4096 exclude_cn_from_sans=true
certificate     -----BEGIN CERTIFICATE-----
MIIFBDCCAuygAwIBAgIUMt8NYFtqaYk8Q1OUfdOWuPjXI0IwDQYJKoZIhvcNAQEL
...
serial_number   32:df:0d:60:5b:6a:69:89:3c:43:53:94:7d:d3:96:b8:f8:d7:23:42
> curl -s http://localhost:8200/v1/wrlinux/ca/pem | openssl x509 -text
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            32:df:0d:60:5b:6a:69:89:3c:43:53:94:7d:d3:96:b8:f8:d7:23:42
...

Note that this is the only time private cert is exposed.

3) Keep root CA offline and create second vault for intermediate CA

Create CSR for Intermediate CA

> vault mount -path=lpd -description="LPD Intermediate CA" -max-lease-ttl=26280h pki
> vault write lpd/intermediate/generate/internal common_name="LPD Intermediate CA" \
ttl=26280h key_bits=4096 exclude_cn_from_sans=true
csr     -----BEGIN CERTIFICATE REQUEST-----
MIIEYzCCAksCAQAwHjEcMBoGA1UEAxMTTFBEIEludGVybWVkaWF0ZSBDQTCCAiIw
...

4) Sign CSR and import Certificate

Note: Intermediate private key never leaves Vault

> vault write wrlinux/root/sign-intermediate csr=@lpd.csr \
common_name="LPD Intermediate CA" ttl=8760h
Key             Value
---             -----
certificate     -----BEGIN CERTIFICATE-----
MIIFSzCCAzOgAwIBAgIUAY8RmTDEzwbkUQ0smevPPIPXOkYwDQYJKoZIhvcNAQEL
...
-----END CERTIFICATE-----
expiration      1523021374
issuing_ca      -----BEGIN CERTIFICATE-----
MIIFBDCCAuygAwIBAgIUMt8NYFtqaYk8Q1OUfdOWuPjXI0IwDQYJKoZIhvcNAQEL
...
-----END CERTIFICATE-----
serial_number   01:8f:11:99:30:c4:cf:06:e4:51:0d:2c:99:eb:cf:3c:83:d7:3a:46
> vault write lpd/intermediate/set-signed certificate=@lpd.crt
Success! Data written to: lpd/intermediate/set-signed

5) Create Role and generate Certificate

Vault uses roles to setup cert creation rules.

> vault write lpd/roles/hosts key_bits=2048 \
max_ttl=8760h allowed_domains=wrs.com allow_subdomains=true \
organization='Wind River' ou=WRLinux
Success! Data written to: lpd/roles/hosts
> vault write lpd/issue/hosts common_name="yow-kscherer-l1.wrs.com" \
ttl=720h
private_key             -----BEGIN RSA PRIVATE KEY-----
MIIEpAIBAAKCAQEAvxHQzyEjc13djntQfCo1ncpwU18a8c8iI4OdaOSQV72zbHf2
...
-----END RSA PRIVATE KEY-----

6) Final Steps

  • Import root CA cert into trusted store
  • Create Policy to limit role access to cert creation
  • Use program like vault-pki-client to automate cert regeneration
  • Audit that certs are only created at expected times
  • Automate cert regeneration

Conclusion

Once this is setup, Heartbleed is a non event! As well as a PKI, i can also use Vault to manage other secrets as well.

Pages