ODROID XU4s are awesome. They are 8-core, 2GHz ARM single-board computers (SBCs) with Gigabit Ethernet and USB 3.0 connections. They only have 2GB of DDR3 RAM, but when paired with a CloudShell2 case and a couple of HDDs (or SSDs), they become an impressive NAS, or better, dedicated network-activity drivers for other projects.
For my machine learning projects, I need data you just can’t buy. This requires SPA (Single-Page App) web data extraction involving multiple clicks and page scrolling that curl can’t handle. Headless Chrome puppeted by RDP (Remote Debug Protocol) is a brilliant solution for this. Here is how I orchestrated several headless Chrome instances across several VPNs in Docker.
A power supply, when suddenly turned off, bleeds voltage slowly. Attached electronics experience a gradual voltage decline from 5V to 3.3V and eventually to zero. The problem is that microcontrollers and microprocessors don’t know how to behave with under-voltage. Their behavior and flash memory integrity is not defined. Flash memory can even be erased. Here I outline my attempts to achieve an efficient logic-level power supply.
Efficiently transport integer-based financial time-series data to dedicated machines and research partners by experimenting with the smallest data transport format(s) among Avro, Parquet, and compressed CSVs.
I bricked a lot of ATtiny85 Digistump chips while I was building a custom bootloader. Here is how I unbricked them and flashed onto them bootloaders with less than $5 of hardware using an AliExpress CH341A BIOS programmer and some soldering. The problem is that the CH341A is not designed for ATtiny chips, but for flashing motherboard BIOS chips, so I had to hack it.
Let’s say I’m in a Starbucks or the airport (or both) and I want to connect to my Windows (or OSX or Linux) machine to check on things. Maybe I don’t want to take my primary computer with me on vacation; maybe I’ll just take a Chromebook. Behind a restrictive firewall all we have is port 80 and port 443 (no VNC or RDP allowed), so let’s make a secure web-based remote desktop gateway with a Raspberry Pi, Docker, and Cloudflare.
Here I demonstrate the benefits of storing financial numbers as integers instead of doubles in SQLite, maintain a five-decimal precision, and greatly reduce my database size in the process.
These useful Linux commands and snippets are a reference for myself, all in one place, so I can refer to them quickly during my workflow without searching for them again and again. These include networking, git, grep, journald, ss/netstat, process uptime, and more. More are added as I go.
My project this weekend was to fork both the BlinkStick C firmware and the Java API to make the Digispark USB hardware with the AVR ATtiny85 microprocessor do something never done before: execute color patterns on the microcontroller, not the host CPU. I outline how I failed many times, and how I eventually succeeded with links to my Github repos and pictures of my hardware hacks.
Because I enjoy using Java so much, and maybe as a reference for the next time I’m playing code golf, I’ve noted some of the lesser-known, obscure features and quirks of Java 8+. You probably know them already, but I find them neat and want to reference them here.
This exercise gave me the opportunity to re-read the docs on the SQLite parameters and confirm their advice with imperial testing. I was able to identify a good page size and understand how the WAL journal works better with the normal synchronous mode. My applications are already running faster.
Why use AWS Glacier for big data backup? It’s exceedingly inexpensive to archive data for disaster recovery on Glacier. AWS Glacier is only US$0.004 per GB/mo, and their SDK is beautiful. Here I outline a pricing matrix for cloud storage providers, and I take a look at the Java SDK for working with AWS Glacier to effectively archive 200GB a week.
This is a problem story about how I preferred Java to other languages to communicate with a troublesome financial REST API endpoint because Java is a strongly-typed and verbose language where it is easy to write unit tests and build up solid modules to make a complete, resilient project.
Normally data packets come and go on the same interface, but VPN routing causes response packets to return through the tunnel and are dropped as unsolicited traffic – the connection hangs until a timeout. This makes it difficult to SSH into a server with an active VPN connection, but I explain a way do just that.
Sometimes remote Java apps leak memory or are killed by the OS. Let’s connect through an SSH tunnel to a remote JVM running on an embedded Ubuntu system and profile memory and CPU usage with free tools VisualVM and JStatD, or Java Mission Control. No firewall adjustments are needed. We’ll also set up JMX connections to allow remote heap dumps and garbage collection. Finally, I’ll explore the features of VisualVM.
Let’s setup a remote Docker daemon on AWS and connect to it securely over HTTPS with PhpStorm. This will allow us to develop and administer Docker containers remotely with the PhpStorm IDE. Here we’ll create TLS certificates, configure the Docker daemon, verify the setup, and configure PhpStorm to even use Docker Compose remotely.
Sometimes you need to be alerted when a website string is present or absent. Here I outline a quick and easy method to alert you to such changes specifically in JSON data using just a free Chrome browser extension. I then showcase two case studies how this has helped me with retail shopping and domain name purchasing.
Breadboard power supplies cost less than a dollar on AliExpress. They are quite convenient for quickly powering and prototyping microprocessor circuits, Arduino projects with sketches, USB-powered prototypes, and on. The imagination is the limit. I spent the morning trying to figure out why my MB102 breadboard power supply was outputting only 3.5V, not the expected 5.0V.
Given a cluster computing rig of twenty-eight processors, each can have either a USB 2.0 or microSD local flash storage. Which type of flash and maker is the fastest? Make the wrong choice and the cluster is painfully slow. Not all microSD cards or USB drives are made the same, and interestingly random read and write speeds vary wildly. Here I test several storage configurations with striking benchmark results.
Markdown is used all the time, for example in GitHub readme.md files, in Slack messages, and in WordPress themes. The HTML produced by rendering Markdown has no class or id attributes, and cannot be nested in HTML tags. How can we style an individual Markdown element? Let’s use a neat CSS trick to easily do just that.
My newer-model Panasonic microwave oven stopped working. To get it working I needed to get past anti-tamper screws and “special” fuses. I suspect Panasonic wants us to buy another microwave instead. Not this time!
For the cluster computing project I’m working on, I need 28 microSD cards. There was an AliExpress sale with good reviews, so I ordered a batch of 30 microSD cards, and at a great price point at the time. As long as the cards are Class 10 and work then we should be good, right? Results: Half are fake or defective. The rest are painfully slow. No refunds.
Let’s build a 112-core 1.2GHz A53 cluster with 56GB of DDR3 RAM and 584GiB of high-availability distributed file storage, running at most 200W. The goal is to use cluster computing to perform fast Apache Spark operations on Big Data, and all on-prem for a fraction of what cloud computing costs.
Problem: How to clean the raw OHLCV candle data from the broker for time series analysis? Suppose we have an autonomous program that prioritizes and continually downloads the latest minute and day candles, as well as periodically gets new symbols from the broker. The problem is that the candles are not guaranteed to be full-period […]
Before acquiring financial time-series candles, I need to know the database schema, storage growth, and cost of maintaining the database. How large could financial data grow and cost?
This would make a good interview question: There are about 120,000 public North American securities, bonds, rights, and index symbols. You have a paid API that can access all of them in OHLCV format if they are quotable. There are two critical API constraints: 15,000 calls per hour 20 calls per second Napkin math Minute […]
Things break. Just the other day through a series of seemingly unrelated events, a new Microsoft x509 certificate made its way into a security handshake process which went unnoticed until current single sign-on sessions began to expire. Had we also had automated security testing, we would have caught this one-off. I’ll explain how I set […]
With PHPUnit I need to mock a class that is declared deep, deep in an arrangement of other classes. I’ll show you how easy it is with class_alias() and the right PHPUnit annotations. Background A given class spins up a headless Chrome instance. I don’t want to fire up and close a real Chrome instance […]
Here are few PHP web shell scripts I found in a production server in late 2016. I’ll show some of them, sneaky as they be, and then my efforts for securing a production server.
I’d like to share my efforts to prevent page breaks in the middle of paragraphs and maximize the use of page space when printing web pages to PDF. I’ll outline how this PHP+NodeJS+Chrome tool and algorithm accomplish this. The motivation is to prevent pictures from being cut off, cut halfway through, or from being pushed […]
GoAccess web log analyzer is a beautiful tool to show real-time traffic and stats – including GeoIP information, bandwidth usage, and visitor time distributions – of my web projects and apps over and above Google’s Webmaster Tools and UA reporting. At a glance I can see current traffic and historic traffic just by adding a […]
Here’s how to go about debugging, stepping through, and profiling remote code like a breeze. These are the steps I took to install/enable Xdebug on a remote LAMP stack and debug/profile hosted code using PhpStorm and a Chrome extension. As a bonus I’ll share how I debug cURL requests with Xdebug too. 1. Setup remote […]
Among friends let’s agree we’ll be privately caching videos and not permanently saving them, or we’ll be using them for Fair Use, and we’ll certainly not upload nor share these videos outside of the originating platform (e.g. YouTube.com). Existing YouTube downloader scripts: YouTube-Downloader (does not work with videos using a cipher signature) YouTube video downloader […]
These are the steps I took to compile Firefox so it can run on a RHEL shared hosting server which doesn’t have D-Bus installed and only has GLibc 2.12. Situation You want to run headless Firefox on a shared host running RHEL You don’t have privileged access (e.g. no root) Your shared host only has […]
This is how I compiled the Xorg Server for RHEL on a CentOS machine with modifications to create a portable Xvfb binary. Xvfb (X virtual framebuffer) is an in-memory display server for Linux and Unix-like OSes. It enables running graphical applications without a display such as running a headless browser (e.g. A full-blown Firefox instance […]
The goal is to run Linux binaries on a shared host where we do not have root access, and there are no package managers installed (because it’s shared hosting). The shared libraries need to be copied over manually. Here is how we can do it. If you’re like me, you might have a (few) shared […]
When getting started with LINE API messaging, you need to know the mid of a message recipient. It’s not his/her username. It’s a string that looks like ub8dbd4a12c322f6c0118883d839c55a4. LINE utilizes a callback URL that you can set for your trial LINE bot. At this endpoint you can place a script, shown below, which will report […]
Sometimes I want to share a large file on my site without tying up bandwidth. If I don’t intend for the file to be downloaded often, I can offload the work to Dropbox and use PHP or htaccess to share a convenient URL. https://yoursite.com/project/psdfile/index.php https://yoursite.com/project/psdfile.psd You can get the direct download link for a Dropbox […]
Every now and then there is an hours-long campaign of fraudulent AdWords-clicking from countries all over the world, ranging from Iran to Singapore, dedicated to clicking my cost-per-click Google ads in a vain attempt to exhaust a given daily budget early. My hat goes off to the chap for organizing the attack, or at least […]
The inspiration to make my own Pokémon Go scanner came from this great site FastPokeMap.se (and Twitter feed). Try this site first before venturing out to make your own scanner. It’s a neat site, but unfortunately each scan is slow takes upwards of 20 seconds, and the failure rate is high. It’s strength comes from […]
Using a fabulous WordPress plugin called External Links to put little arrow icons beside external links was the task this morning. Installation and setup took less than 60 seconds and all worked well. But wait, there are arrow icons in my header navigation links. Surely as per the documentation I can add the noicon class […]
Sometimes one of my sites is under attack from a click-fraud campaign. I needed to devise a way to detect such an attack and instantly and automatically change my Cloudflare security level from ‘medium’ to ‘under attack’. When in under-attack mode, Cloudflare performs additional browser checks to filter out robots. It doesn’t stop all the […]
That was really a really high bridge – 70m – and the river was shallow beneath us. There was no one else there. The team of jump masters was ready for just us two. It was amazing, like we had the whole valley to ourselves, Alex and I. We wanted an adventure, something we’d never […]