Do You Need A Residential Data Hub?

Data is essential for effective decision-making - even at home.
Residential Data Hubs: A Necessary Element @ Home

With more and more devices running in every home, it is becoming increasingly important to collect and manage all of the data that is available. Most people have no idea just how much data is currently being collected in their homes. But as the future arrives, almost every home will need to aggregate and assess data in order to make informed decisions and take informed actions. When that time arrives for you, you will need a “plug and play” residential data hub. Such devices will become an instrumental part of transforming your household into an efficient information processing system.

Currently, data is collected on your utility usage (e.g., electricity, water, Internet data usage, thermostat settings, etc). But few people realize that future homes will be collecting enormous amounts of data. We (at the Olsen residence and at Lobo Strategies) have exploited many of the new technologies that are part of the Internet of Things (IoT). Through this experience, it is apparent just how much data is now available. We are collecting data about where out family and team members are located. We are collecting data on the physical environment throughout our buildings – including temperature and occupancy. We are collecting information on the internal and external network resources being used by “the team.” And the amount of data being collected today will be dwarfed by the amount data that will be collected in the next few years.

The Necessity Of Residential Data Hubs

Over the past six months, we have been assembling a huge portfolio of data sources.

  • We use our DNS server logs and firewall logs to collects access-related data.
  • The Home Assistant platform collects data about all of our IoT devices.  [Note: In the past month, we’ve begun consolidating all of our IoT data into a TICK platform.]
  • Starting this week, we are now using router data to optimize bandwidth consumption.

While it is possible to manage each of these sources, it is taking quite a bit of “integration” (measured in many labor hours) to assemble and analyze this data. But we are now taking steps to assemble all of this data for easy analysis and decision-making

Consolidating Router Data

Our ISP put us in a box: they offered us an Internet “data only” package at a seriously reduced price. But buried within the contract were express limits on bandwidth.  [Note: Our recent experience has taught us that our current ISP is not a partner; they are simply a service provider. Indeed, we have learned that we will treat them as such in the future.] Due to their onerous actions, we are now on a needed content diet. And as of the beginning of the week, we have taken the needed steps to stay within the “hidden” limits that our ISP imposed.

Fortunately, our network architect (i.e., our beloved CTO) found the root cause of our excessive usage. He noted the recent changes approved by the premise CAB (i.e., our CTO’s beloved wife). And then he correlated this with the DNS log data that identified a likely source of our excess usage. This solved the immediate problem. But what about the irreversible corrective action?

And as of yesterday, we’ve also taken the steps needed for ongoing traffic analysis.

  1. We’ve exploited our premise network decisions. We normally use residential-grade equipment in our remote locations. In candor, the hardware is comparable to its pricier, enterprise brethren. But the software has always suffered. Fortunately, we’ve used DD-WRT in any premise location. By doing this, we had a platform that we could build upon.
  2. The network team deployed remote access tools (i.e., ssh and samba) to all of our premise routers.
  3. A solid-state disk drive was formatted and then added to the router’s USB 3.0 port. [Note: We decided to use a non-journaled filesystem to limit excessive read/writes of the journal itself.]
  4. Once the hardware was installed, we deployed YAMon on the premise router.
  5. After configuring the router and YAMon software, we began long-term data collection.

Next Steps

While the new network data collection is very necessary, it is not a solution to the larger problem. Specifically, it is adding yet another data source (i.e., YADS). So what is now needed is a real nexus for all of the disparate data sources. We truly need a residential data hub. I need to stitch together the DNS data, the router data, and the IoT data into a single, consolidated system with robust out-of-the-box analysis tools.  

I wonder if it is time to build just such a tool – as well as launch the services that go along with the product.

Home Assistant Portal: TNG

Over the past few months, I have spent much of my spare time deepening my home automation proficiency.  And most of that time has been spent understanding and tailoring Home Assistant. But as of this week, I am finally at a point where I am excited to share the launch of my Home Assistant portal. 

Overview

Some of you may not be familiar with Home Assistant (HA). So let me spend one paragraph outlining the product. HA is an open source “home” automation hub. As such, it can turn your lights on and off, manage your thermostat, open/close your garage door (and window blinds). And it can manage your presence within (and around) your home. And it works with thousands of in-home devices. It provides an extensive automation engine so that you can script countless events that occur throughout your home.  It securely integrates with key cloud services (like Amazon Alexa and Google Assistant). Finally, it is highly extensible – with a huge assortment of add-ons available to manage practically anything.

Meeting Project Goals

Today, I finished my conversion to the new user interface (UI). While there have been many ways to access the content within HA before now, the latest UI (code-named Lovelace) make it possible to create a highly customized user experience. And coupled with the theme engine baked into the original UI (i.e., the ‘frontend’), it is possible to make a beautiful portal to meet your home automation needs.

In addition to controlling all of the IoT (i.e., Internet of Things) devices our home, I have baked all sorts of goodies into the portal. In particular, I have implemented (and tailored) the data collection capabilities of the entire household. At this time, I am collecting key metrics from all of my systems as well as key state changes for every IoT device. In short, I now have a pretty satisfying operations dashboard for all of my home technology.

Bottom Line

Will my tinkering end with this iteration? If you know me, then you already know the answer. Continuous process improvement is a necessary element for the success of any project. So I expect rapid changes will be made almost all of the time – and starting almost immediately. And as a believer in ‘agile computing’ (and DevOps product practices), I intend to include my ‘customer(s)’ in every change. But with this release, I really do feel like my HA system can (finally) be labeled as v1.0! 

Time Series Data: A Recurring Theme

When I graduated from college (over three-and-a-half decades ago), I had an eclectic mix of skills. I obtained degrees in economics and political sciences. But I spent a lot of my personal time working on building computers and writing computer programs. I also spent a lot of my class time learning about econometrics – that is, the representation of economic systems in mathematical/statistical models. While studying, I began using SPSS to analyze time series data.

Commercial Tools (and IP) Ruled

When I started my first job, I used SPSS for all sorts of statistical studies. In particular, I built financial models for the United States Air Force so that they could forecast future spending on the Joint Cruise Missile program. But within a few years, the SPSS tool was superseded by a new program out of Cary, NC. That program was the Statistical Analysis System (a.k.a., SAS). And I have used SAS ever since.

At first, I used the tool as a very fancy summation engine and report generator. It even served as the linchpin of a test-bed generation system that I built for a major telecommunications company. In the nineties, I began using SAS for time series data analysis. In particular, we piped CPU statistics (in the form of RMF and SMF data) into SAS-based performance tools.

Open Source Tools Enter The Fray

As the years progressed, my roles changed and my use of SAS (and time series data) began to wane. But in the past decade, I started using time series data analysis tools to once again conduct capacity and performance studies. At a major financial institution, we collected system data from both Windows and Unix systems throughout the company. And we used this data to build forecasts for future infrastructure acquisitions.

Yes, we continued to use SAS. But we also began to use tools like R. R became a pivotal tool in most universities. But many businesses still used SAS for their “big iron” systems. At the same time, many companies moved from SAS to Microsoft-based tools (including MS Excel and its pivot tables).

TICK Seizes Time Series Data Crown

Over the past few years, “stack-oriented” tools have emerged as the next “new thing” in data centers. [Note: Stacks are like clouds; they are everywhere and they are impossible to define simply.] Most corporations have someone’s “stack” running their business – whether it be Amazon AWS, Microsoft Azure, Docker, Kubernetes, or a plethora of other tools.  And most commercial ventures are choosing hybrid stacks (with commercial and open source components).

And the migration towards “stacks” for execution is encouraging the migration to “stacks” for analysis. Indeed, the entire shift towards NoSQL databases is being paired with a shift towards time series databases.  Today, one of the hottest “stacks” for analysis is TICK (i.e., Telegraf, InfluxDB, Chronograf, and Kapacitor).

TICK Stack @ Home

Like most projects, I stumbled onto the TICK stack. I use Home Assistant to manage a plethora of IoT devices. And as the device portfolio has grown, my need for monitoring these devices has also increased. A few months ago, I noted that an InfluxDB add-on could be found for HassIO.  So I installed the add-on and started collecting information about my Home Assistant installation.

Unfortunately, the data that I collected began to exceed my capacity to store the data on the SD card that I had in my Raspberry Pi. So after running the system for a few weeks, I decided to turn the data collection off – at least until I solved some architectural problems. And so the TICK stack went on the back burner.

I had solved a bunch of other IoT issues last week. So this week, I decided to focus on getting the TICK stack operational within the office. After careful consideration, I concluded that the test cases for monitoring would be a Windows/Intel server, a Windows laptop, my Pi-hole server, and my Home Assistant instance.

Since I was working with my existing asset inventory, I decided to host the key services (or daemons) on my Windows server. So I installed Chronograf, InfluxDB, and Kapicitor onto that system. Since there was no native support for a Windows service install, I used the Non-Sucking Service Manager (NSSM) to create the relevant Windows services. At the same time, I installed Telegraf onto a variety of desktops, laptops, and Linux systems. After only a few hiccups, I finally got everything deployed and functioning automatically. Phew!

Bottom Line

I implemented the TICK components onto a large number of systems. And I am now collecting all sorts of time series data from across the network. As I think about what I’ve done in the past few days, I realize just how important it is to stand on the shoulders of others. A few decades ago, I would have paid thousands of dollars to collect and analyze this data. Today, I can do it with only a minimal investment of time and materials. And given these minimal costs, it will be possible to use these findings for almost every DevOps engagement that arises.

VPNFilter Scope: Talos Tells A Tangled Tale

IoT threats
Hackers want to take over your home.

Several months ago, the team at Talos (a research group within Cisco) announced the existence of VPNFilter – now dubbed the “Swiss Army knife” of malware. At that time, VPNFilter was impressive in its design. And it had already infected hundreds of thousands of home routers. Since the announcement, Talos continued to study the malware. Last week, Talos released its “final” report on VPNFilter. In that report, Talos highlighted that the VPNFilter scope was/is far larger than first reported.

“Improved” VPNFilter Capabilities

In addition to the first stage of the malware, the threat actors included the following “plugins”:

  • ‘htpx’ – a module that redirects and inspects the contents of unencrypted Web traffic passing through compromised devices.
  • ‘ndbr’ – a multifunctional secure shell (SSH) utility that allows remote access to the device. It can act as an SSH client or server and transfer files using the SCP protocol. A “dropbear” command turns the device into an SSH server. The module can also run the nmap network port scanning utility.
  • ‘nm’ – a network mapping module used to perform reconnaissance from the compromised devices. It performs a port scan and then uses the Mikrotik Network Discovery Protocol to search for other Mikrotik devices that could be compromised.
  • ‘netfilter’ – a firewall management utility that can be used to block sets of network addresses.
  • ‘portforwarding’ – a module that allows network traffic from the device to be redirected to a network specified by the attacker.
  • ‘socks5proxy’ – a module that turns the compromised device into a SOCKS5 virtual private network proxy server, allowing the attacker to use it as a front for network activity. It uses no authentication and is hardcoded to listen on TCP port 5380. There were several bugs in the implementation of this module.
  • ‘tcpvpn’ – a module that allows the attacker to create a Reverse-TCP VPN on compromised devices, connecting them back to the attacker over a virtual private network for export of data and remote command and control.
Disaster Averted?

Fortunately, the impact of VPNFilter was blunted by the Federal Bureau of Investigation (FBI). The FBI recommended that every home user reboot their router. The FBI hoped that this would slow down infection and exploitation. It did. But it did not eliminate the threat.

In order to be reasonably safe, you must also ensure that you are on a version of router firmware that protects against VPNFilter. While many people heeded this advice, many did not. Consequently, there are thousands of routers that remain compromised. And threat actors are now using these springboards to compromise all sorts of devices within the home. This includes hubs, switches, servers, video players, lights, sensors, cameras, etc.

Long-Term Implications

Given the ubiquity of devices within the home, the need for ubiquitous (and standardized) software update mechanisms is escalating. You should absolutely protect your router as the first line of defense. But you also need to routinely update every type of device in your home.

Bottom Line
  1. Update your router! And update it whenever there are new security patches. Period.
  2. Only buy devices that have automatic updating capabilities. The only exception to this rule should be if/when you are an accomplished technician and you have established a plan for performing the updates manually.
  3. Schedule periodic audits of device firmware. Years ago, I did annual battery maintenance on smoke detectors. Today, I check every device at least once a month. 
  4. Retain software backups so that you can “roll back” updates if they fail. Again, this is a good reason to spend additional money on devices that support backup/restore capabilities. The very last thing you want is a black box that you cannot control.

As the VPNFilter scope and capabilities have expanded, the importance of remediation has also increased. Don’t wait. Don’t be the slowest antelope on the savanna.

Alexa Dominance: Who Can Compete?

Alexa Dominance
Amazon Echo devices now have a foothold in most American homes.

Voice control is the ‘holy grail’ of UI interaction. You need only look at old movies and television to see that voice is indeed king. [For example, the Robinson family used voice commands to control their robot. And Heywood Floyd used voice as his means of teaching and communicating with HAL.] Today, there are many voice assistants available on the market. These include: Amazon Alexa, Apple Siri, Google Assistant (aka Google Home), Microsoft Cortana, Nuance Nina, Samsung Bixby, and even the Voxagent Silvia.  But the real leaders are only now starting to emerge from this crowded market. And as of this moment, Alexa dominance in third-party voice integration is apparent.

Apple Creates The Market

Apple was the first out-of-the-gate with the Apple Siri assistant. Siri first arrived on the iPhone and later on the iPad. But since its introduction, it is now available as part of the entire Apple i-cosystem. If you are an Apple enthusiast, Siri is on your wrist (with the watch). Siri is on your computer. And Siri is on your HomePod speaker. It is even on your earbuds. And in the past six months, we are finally starting to see some third-party integration with Siri.

Amazon Seizes The Market

Amazon used an entirely different approach to entrench its voice assistant. Rather than launch the service across all Amazon-branded products, Amazon chose to first launch a voice assistant inside a speaker. This was a clever strategy. With a fairly small investment, you could have an assistant in the room with you. Wherever you spent time, your assistant would probably be close enough for routine interactions.

This strategy did not rely upon your phone always being in your pocket.  Unlike Apple, the table stakes for getting a voice assistant were relatively trivial. And more importantly, your investment was not limited to one and only one ecosystem.  When the Echo Dot was released at a trivial price point (including heavy discounts), Alexa started showing up everywhere. 

From the very outset, an Amazon voice assistant investment required funds for a simple speaker (and not an expensive smartphone). You could put the speaker in a room with a Samsung TV. Or you could set it in your kitchen. So as you listened to music (while cooking), you could add items to your next shopping list.  And you could set the timers for all of your cooking.  In short, you had a hands-free method of augmenting routine tasks.   In fact, it was this integration between normal household chores coupled with the lower entry price that helped to spur consumer purchases of the Amazon Echo (and Echo Dot).

A second key feature of Amazon’s success was its open architecture. Alexa dominance was amplified as additional hardware vendors adopted the Alexa ecosystem. And the young Internet-of-Things (IoT) marketplace adopted Alexa as its first integration platform. Yes, many companies also provided Siri and Google Assistant integration. But Alexa was their first ‘target’ platform.

The reason for Alexa integration was (and is) simple: most vendors sell their products through Amazon. So vendors gained synergies with their main supplier. Unlike the Apple model, you didn’t have to go to a brick and mortar store (whether it be the Apple Store, the carriers’ stores, or even BestBuy/Target/Walmart).  Nor did a vendor need to use another company’s supply chain. Instead, they could bundle the whole experience through an established sales/supply channel.

Google Arrives Late To The Party

While Apple and Amazon sparred with one another, Google jumped into the market. They doubled-down on ‘openness’ and interoperability.  And at this moment, the general consensus is that the Google offering is the most open. But to date, they have not gained traction because their entry price was much higher than Amazon’s. We find this to be tremendously interesting. Google got the low price part down when they offered a $20-$30 video streamer.

But with the broader household assistant, Google focused first upon the phone (choosing to fight with Apple) rather than a hands-free device that everyone could use throughout the house. And rather than follow the pricing model that they adopted with the Chromecast, Google chose to offer a more capable (and more expensive) speaker product. So while they used one part of the Amazon formula (i.e., interoperability), they avoided the price-sensitive part of the formula.

Furthermore, Google could not offer synergies with the supply chain. Consequently, Google still remains a third-place contender. For them to leap back into a more prominent position, they will either have to beat ‘all-comers’ on price or they will have to offer something really innovative that the other vendors haven’t yet delivered.

Alexa Dominance

Amazon dominance in third-party voice integration is apparent. Not only can you use Alexa on your Amazon ‘speakers’, you can use it on third-party speakers (like Sonos). You can launch actions on your phone and on your computer. And these days, you can use it with your thermostat, your light bulbs, your power sockets, your garage door, your blinds, and even your oven. In my case, I just finished integrating Alexa with Hue lights and with an ecobee thermostat.

Bottom Line

Market dominance is very fleeting. I remember when IBM was the dominant technology provider. After IBM, Microsoft dominated the computer market. At that time, companies like IBM, HP, and Sun dominated the server market. And dominance in the software market is just as fleeting. Without continually focusing on new and emerging trends, leadership can devolve back into a competitive melee, followed by the obsolescence of the leader. Indeed, this has been the rule as dominant players have struggled to maintain existing revenue streams while trying to remain innovative.

Apple is approaching the same point of transition. Their dominance of the phone market is slowly coming to an end. Unless they can pivot to something truly innovative, they may suffer the same fate as IBM, Sun, HP, Dell, Microsoft, and a host of others.

Google may be facing the same fate – though this is far less certain. Since Google’s main source of revenue is ‘search-related’ adverstising, they may see some sniping around the edges (e.g., Bing, DuckDuckGo, etc). But there is no serious challenge to their core business – at this time.

And Amazon is in a similar position: their core revenue is the supply chain ‘tax’ that they impose upon retail sales. So they may not see the same impact on their voice-related offerings. But they dare not rest upon their laurels. In candor, the Amazon position is far more appealing than the Google position. The Amazon model relies upon other companies building products that Amazon can sell. So interoperability will always be a part of any product that Amazon brands – including voice assistants. 

Only time will sort out the winners and losers. And I daresay that there is room enough for multiple ‘winners’ in this space. But for me, I am now making all of my personal and business investments based upon the continued dominance of Alexa.