CSV to Elasticsearch in order to replace your Excel with Kibana

Posted on novembre 25, 2018 in Python

Last night, I was asked if I could setup some frontend to make some stats out of a CSV. In a more interactive and collaborative way than Excel.

I was first asked to do a small Django project since it used to be my goto technology at the time.

But for this need, the use of Elasticsearch was perfect and Kibana helped me to not develop any frontend and solve a lot of time.

The CSV export looked like this but had at least 30 columns and 200k lines:


Python way to push the CSV to Elasticsearch

Elasticsearch requires JSON documents, so the first step was to convert the CSV to json.

Instead of writing a CSV to JSON parser, I used the pandas library which makes the whole process a lot easier and faster (the csv file had hundreds of thousands of lines).

And by looking at the official elasticsearch python SDK, I just needed to transform the whole CSV into a dict.

import sys
import pandas as pd
import argparse
from elasticsearch import Elasticsearch, helpers

def main():
    parser = argparse.ArgumentParser(description='Process some integers.')
    parser.add_argument('filename', type=str,
            help='filename to parse')
    parser.add_argument('index', type=str,
            help='index name to use')
    args = parser.parse_args()

    filename = args.filename
    index_name = args.index

    # initiate Elasticsearch connection
    es = Elasticsearch()

    # parse the csv with pandas
    df = pd.read_csv(filename, sep=';', error_bad_lines=False)
    # trim whitespace and stuff
    data_frame_trimmed = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
    # replace `nan` values with empty string
    data_frame_trimmed = df.fillna('')
    # transform the whole data frame into a huge python dict
    records = data_frame_trimmed.to_dict(orient='records')

    # use bulk actions to push the data
    actions = []
    for i, r in enumerate(records):
    actions.append({"_index": index_name,
            "_type": "vuln",
            "_id": i,
            "_source": r})
    ret = helpers.bulk(es, actions=actions)

if __name__ == '__main__':

And voila !

$ curl http://localhost:9200/vuln_2018-11-25/_search?pretty | jq -r .
  "took": 0,
  "timed_out": false,
  "_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
  "hits": {
"total": 3,
"max_score": 1,
"hits": [
    "_index": "index_name_2018-11-25",
    "_type": "vuln",
    "_id": "0",
    "_score": 1,
    "_source": {
      "Column1": 1,
      "Column2": 2,
      "Column3": 3,
    "_index": "index_name_2018-11-25",
    "_type": "vuln",
    "_id": "1",
    "_score": 1,
    "_source": {
      "Column1": "a",
      "Column2": "b",
      "Column3": "c",
    "_index": "index_name_2018-11-25",
    "_type": "vuln",
    "_id": "2",
    "_score": 1,
    "_source": {
      "Column1": "blih",
      "Column2": "blah",
      "Column3": "bluh",

After that you just need to install Kibana and enjoy your graphs, tables and so on in a more collaborative way, dynamic filters etc.

Continue reading

Still here !

Posted on novembre 21, 2018 in misc

After a few years off not having the time or motivation to write and share thoughts, discovery or useless snippets ; I take the time to put this blog on Github pages.

Reading the last articles I couldn't help but lol :-)

I'm going to try taking the time to write the following articles:

  • How I've helped a startup move from AWS S3 to an on-premise object storage platform powered by OpenIO
  • Write an old article on how I've built a 80Gbit/s live video streaming platform for Free Mobile announcement
  • How to use dnsdist as a reverse proxy for your public-facing DNS server and protect them
  • Complete the debootstrap how-to by adding the UEFI part

Continue reading


Posted on avril 24, 2014 in System

Since a few months, I've been inclined to test and use Salt Stack. I manage a lot a heterogeneous plateforms, but each one are composed of similar machines who does the same stuff.

For example, once three months, I'm being asked to install a new packages, configure a new printer on desktop machines of our datacenter's collaborators. What a great use case :)


Salt is like Puppet and Chef, which are also deployment and automation tools. I find it more lightweight and


It seems that Salt Stack is not yet in the official Ubuntu repositories

Things to do on your master host:

apt-get install python-software-properties
add-apt-repository ppa:saltstack/salt

apt-get update
apt-get install salt-master

Things to do on your client host:

apt-get install python-software-properties
add-apt-repository ppa:saltstack/salt

apt-get update
apt-get install salt-minion

By default a Salt Minion will try to connect to the DNS name "salt"; if the Minion is able to resolve that name correctly, no configuration is needed. If the DNS name "salt" does not resolve, you need to edit /etc/salt/minion


Restart everything


/etc/init.d/salt-master restart


/etc/init.d/salt-minion restart


Communications bettwen the Master and your Minions is done via AES encryption. But to communicate, your Minion's key must be accepted by the Master

List all keys:

$ salt-key -L
Accepted Keys:
Unaccepted Keys:
Rejected Keys:

Accept all keys

$ salt-key -A

Accept one key

$ salt-key -a NOC1-VTY2

If you list your keys again you should get an output like this:

$ salt-key -L
Accepted Keys:
Unaccepted Keys:
Rejected Keys:

You can now test the communication between your Master and one of all of your Minions

$ salt 'NOC1-VTY2' test.ping
$ salt '*' test.ping


Now, I want to be able to add another computer to our NOC team without having to push manually all the configurations (NIS/NFS/packages etc)

There is two major things, the directive file_roots and the file top.sls According to the documentation, SLS (or SaLt State file) is a representation of the state in which a system should be in.


In your /etc/salt/master file, you need to uncomment the file_roots directive. It defines the location of the Salt file server and the SLS definitions. Mine look like this

    - /srv/salt/

After this modification, restart your server


Doing specific stuff to specific machines in the main purpose of Salt. This is defined within the top.sls file.

This can be done by:

Ways Example
Globbing "webserverprod"
Regular Expressions "^(memcache|web).(qa|prod).loc$"
Lists "dev1,dev2,dev3"
Grains "os:CentOS"
Node Groups
Compound Matching

This is my top.sls file:

     - nagios.client
     - repos.online
     - match: pcre
     - yp.install
     - yp.nsswitch
     - nfs.mount_noc


     - nagios.client

This block declare the global environment the minion must apply. In this case, every machine will be assigned the nagios.client directive It's going to execute /srv/salt/nagios/client.sls


This section matches machine using the Salt "grain" system, basically from system attributes. It will execute /srv/salt/repos/online.sls


This section matches using Perl regular expression feature If the hostname of the machine matches this regex, it will be assigned the few directives It will execute, /srv/salt/nagios/yp/install.sls, /srv/salt/nagios/yp/nsswitch.sls, /srv/salt/nagios/nfs/mount_noc.sls


Continue reading

How-To Debootstrap

Posted on avril 23, 2014 in System

For my infrastructure purposes I often need to install as fast as possible. Most of my servers comes with 4 disks and one or more RAID card.

I usually don't trust the RAID cards, so I always create a raid0 / disk in order to use every logical volume like it was a real disk.

And I always use the above partition schema

mount size
/boot 200M
/ *


# find your slot
slot=`hpacucli ctrl all show | grep -i slot | awk '{print $6}'
hpacucli ctrl slot=$slot ld 1 delete
# create one raid0 per physical disk
for phys in `hpacucli ctrl all show config | grep physicaldrive | awk '{print $2}'`;
  hpacucli controller slot=$slot create type=ld drives=$phys raid=0


If you use an old server, you must do some cleaning

Let's start by zeroing the first 100MB of the partition in order to be sure to erase the partition table, the MBR

for i in {a..d} ;
  dd if=/dev/sda of=/dev/zero count=100 bs=1M

Afterwars, let's notify the kernel about devices changes


MSDOS partitions

for i in {a..d} ;
  parted /dev/sd$i --script -- mklabel msdos
  parted /dev/sd$i -a optimal --script -- unit MB mkpart primary 1 200
  parted /dev/sd$i -a optimal --script -- unit MB mkpart primary 200 -1

GPT partitions

For GPT partitions you need to create BIOS Boot partition a small partition, at least 1mb.


for i in {a..d} ;
    parted /dev/sd$i --script -- mklabel gpt
    parted /dev/sd$i -a optimal --script -- unit MB mkpart grub fat32 1mb 2mb
    parted /dev/sd$i -a optimal --script -- unit MB set 1 bios_grub on
    parted /dev/sd$i -a optimal --script -- unit MB mkpart primary 2mb 200
    parted /dev/sd$i -a optimal --script -- unit MB mkpart primary 200 -1


I prefer to use software raid with mdadm. If you want to boot on a mdadm's volume you need it to use the 0.90 metadatas For you /, use the raid-level you want and don't give any metadata paramaters so it can takes the 1.2 one.

/!\ If you use GPT partitions, be aware that /dev/sdx1 is the BIOS partition, not your future /boot, start at /dev/sdx2

# for msdos partitions
mdadm --create /dev/md0 --metadata=0.90 --assume-clean --raid-devices=4 --level=1 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm --create /dev/md1 --assume-clean --raid-devices=4 --level=6 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2

# for gpt partitions
mdadm --create /dev/md0 --metadata=0.90 --assume-clean --raid-devices=4 --level=1 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2
mdadm --create /dev/md1 --assume-clean --raid-devices=4 --level=6 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3

Let's format the RAID volumes

mkfs.ext4 /dev/md0
mkfs.ext4 /dev/md1

Let's start the debootstrap session. I use a basic /etc/apt/sources.list using this convenient sources.list generator

mkdir /mnt/root
mount /dev/md1 /mnt/root
apt-get update; apt-get install -y debootstrap
debootstrap trusty /mnt/root
mount -o bind /dev /mnt/root/dev
mount -o bind /proc /mnt/root/proc
mount -o bind /sys /mnt/root/sys

# basic fstab
echo "proc            /proc   proc    defaults                0       0
/dev/md1 /       ext4    errors=remount-ro       0       1
/dev/md0        /boot   ext4    defaults                0       2
" > /mnt/root/etc/fstab

echo "#############################################################
################### OFFICIAL UBUNTU REPOS ###################

###### Ubuntu Main Repos
deb http://fr.archive.ubuntu.com/ubuntu/ trusty main restricted universe multiverse
deb-src http://fr.archive.ubuntu.com/ubuntu/ trusty main restricted universe multiverse

###### Ubuntu Update Repos
deb http://fr.archive.ubuntu.com/ubuntu/ trusty-security main restricted universe multiverse
deb http://fr.archive.ubuntu.com/ubuntu/ trusty-updates main restricted universe multiverse
deb-src http://fr.archive.ubuntu.com/ubuntu/ trusty-security main restricted universe multiverse
deb-src http://fr.archive.ubuntu.com/ubuntu/ trusty-updates main restricted universe multiverse
" > /mnt/root/etc/apt/sources.list

Now we can go the installed volume and prepare the OS

cd /mnt/root
chroot .
# mount /boot for the future kernel installation
mount /boot
# generate a few locales
locale-gen fr_FR.UTF-8
locale-gen fr_FR
locale-gen en_US.UTF-8
locale-gen en_US

apt-get update
# don't forget to install mdadm on the system so it can boots correctly
apt-get install -y mdadm lvm2
# install the required kernel
apt-get install -y linux-image-generic
# install an openssh-server so you can remotely have access to the system
apt-get install -y openssh-server
# change your root password!!
echo "root:changeme"|chpasswd

Stop the few services

/etc/init.d/ssh stop

Umount everything, sync for the last i/o and reboot

umount /boot
umount /mnt/root/dev
umount /mnt/root/proc
umount /mnt/root/sys


Work in progress


Without LVM

If you happen to boot on a rescue live-cd on one of this configuration, it will detect a RAID system but without the correct device names

mdadm -S /dev/md126
mdadm -S /dev/md127
mdadm --examine --scan /dev/sda{1..4} >> /etc/mdadm/mdadm.conf
mdadm --assemble --scan

Your /dev/md0 and /dev/md1 should come online

mkdir -p /mnt/root
mount /dev/md1 /mnt/root
mount -o bind /dev /mnt/root/dev
mount -o bind /proc /mnt/root/proc
mount -o bind /sys /mnt/root/sys
chroot /mnt/root

Here you go!


Thanks to my friends Pierre Tourbeaux and Michael Kozma for all the advices and debugging over the year :)

Continue reading


Posted on avril 18, 2014 in System

At Online, we've been trying ZFS On Linux on a few services.

Here's a small how-to (and also a reminder) on how to install it and manage it:


    $ apt-add-repository --yes ppa:zfs-native/stable
    $ apt-get update && apt-get install ubuntu-zfs

ZFS comes with a RAID soft like system

RAID type RAID-z type Loosable disks Min disks
RAID5 raidz 1 disks 3 disks
RAID6 raidz-2 2 disks 4 disks
RAID7 raidz-3 3 disks 5 disks

Now we're going to create a zpool called storage

    $ zpool create -f storage raidz2 c2d{1..5}

If we wan't to add MOAR disks

    $ zpool add -f storage raidz2 c2d{6..10}

Here are a few problems I've experienced:

ZFS Resilvering (replace a drive)

If you've got some spare disks, you should add them your spare pool

    $ zpool add storage spare c2d11 c2d12

By doing so, if a disk fails, ZFS will replace it automatically with the failed one. Personnaly, I prefer to do it manualy. Assuming c2d4 failed, to replace it by c2d11, let's do this:

    $ zpool replace c2d4 c2d11

You will now have c2d11 resilvering your entire zpool. Once the resilver ends, the failed disk is ejected from the zpool.

ZFS Scrubbing

ZFS has a scrub feature to detect and correct silently errors. You could assimilate this to ECC RAM (RAM with error recovery). ZFS scrub feature check every block of your pool against a SHA-256 checksum.

You can invoke a scrub or be forced to live the scrub when a disk fails and you have to replace it.

Recently, on a 200T system, I replaced a failed disk by a spare one. It scrubbed the 200T. The zpool status was mentionning a duration of about 500 hours of scrubbing. Time to hang yourself.

Fortunately, there is some tunnable settings in /sys/module/zfs/parameters

    # Prioritize resilvering by setting the delay to zero
    $ echo 0 > zfs_resilver_delay

    # Prioritize scrubs by setting the delay to zero
    $ echo 0 > zfs_scrub_delay

These changes takes effect immediatly and I haven't experienced any problems afterwards. Everything synced in 60 hours.

Attached a few other features to tune your scrub:

feature default value description
zfs_top_maxinflight 32 maximum I/Os per top-level
zfs_resilver_delay 2 number of ticks to delay resilver
zfs_scrub_delay 4 number of ticks to delay scrub
zfs_scan_idle 50 idle window in clock ticks
zfs_scan_min_time_ms 1000 min millisecs to scrub per txg
zfs_free_min_time_ms 1000 min millisecs to free per txg
zfs_resilver_min_time_ms 3000 min millisecs to resilver per txg
zfs_no_scrub_io 0 (bool) set to disable scrub i/o
zfs_no_scrub_prefetch 0 (bool) set to disable srub prefetching


Continue reading

Symfony performances

Posted on avril 18, 2014 in System

Since a few weeks, we've stumble upon a few performances problem on our Symfony2 backend. For the record, it's a 50k line codes, lots of feature and custom bundles.


On the first request, Symfony PHP's code must discover all the classes of your code. It does a lot of stat/open/read/close on each file of your project. We've observe a 100% CPU usage for a few seconds, the time required for the code to discover everything.

By default, this feature is called without the --optimize flag.

So we had to custom our fabric script by adding

  $ php composer.phar dump-autoload --optimize

For example, our autoloader filer was created with 300 lines before. After the --optimize flag, it now has more than 5 000 lines.

To be continued with APC support and OpCode Cache of PHP 5.5

Continue reading

Welcome !

Posted on avril 10, 2014 in Python

Welcome on this brand new blog using Pelican, a realy nice static-blog framework

Continue reading