Systems Engineering – Page 2

Server Crippled by Updates Again

February update cycle again sent my server into a reboot loop, shutting down all services until I could diagnose the problem on site.

Following the same steps as in my previous post, I switched the boot choice to Safe Mode, and observed another boot failure. This time instead of getting into the weeds of troubleshooting the update system with a second Safe Mode boot, I decided to let the server go back to the normal boot mode, because some other websites have reported this as a good solution.

In this case, the failed Safe Mode boot followed by no other action did successfully restore the server.

After reviewing the Event Viewer logs, I could only find a repeated Event ID 1074, “TrustedInstaller.exe has initiated the restart”. KB2992611 and KB890830 both installed successfully before the loop, then KB4502496, KB2822241, and KB4537814 installed after the loop.

My current recommendation is to disable automatic updates for Windows servers and only perform update checks while on site. Also, run the update check twice in a row. The servicing stack update from December didn’t show up until after recovering from the reboot loop and then checking again for more updates.

Reboot Loop After KB4525246 Update

Several other sites confirmed recent server failures after running Windows Updates. Here are the basic steps I used to recover.

Attach a keyboard and enter BIOS setup. Make sure Quick Boot is disabled.

Press F8 while restarting the server to open the Advanced Boot Options menu.

I tried Safe Mode, but did not see a successful boot there.

Next I tried Repair Your Computer, which brought me to the “Choose an option” screen.

Select Troubleshoot, then select Command Prompt. Follow the instructions to log in as one of the administrators.

Solving Disconnected Folders Over Wi-Fi

Avoid the First Default Setting for Share Caching

I’m starting to realize that the Offline Files feature in Windows causes more problems than it solves when it comes to unreliable network connections.

In 2016, I described how to minimize the effects of an occasionally high ping when the slow-link mode goes into effect: Offline Files Stay Disconnected

But that doesn’t solve the problem. Fine-tuning or even disabling the slow-link mode forces the Client Side Cache (CSC) to use its “Action on server disconnect” configuration any time the network isn’t performing perfectly. The default behavior, “Work offline”, treats each affected (meaning cache-enabled) share as being totally unavailable and then the CSC attempts to retrieve cached copies. This happens even if the server is still available but failed a single ping check.

Why is this still a problem? Well, in practice, most files don’t need to be available offline. By default, the Windows file server is configured, and the Windows client is designed to allow each user to select individual files as “Always available offline” from the file context menu. When a user selects this option, that one file is copied to the CSC, and in theory that one file is always available. This allows for targeted use and minimal sync time. The problem arises with all the other files. When the CSC goes offline and marks the shared folder as disconnected, it effectively blocks access to all the files that were never cached, even if the server and its files are still available.

At this point, you and I now understand the situation that needs to be avoided. We don’t want to have a large number of files under the unnecessary clutches of the CSC, regardless of network quality.

Update 08/17/2018

At first, I thought the solution was to change the file server’s default configuration of allowing users to decide which files are cached. I changed folders that needed maximum online availability to be set to “No files or programs from the shared folder are available offline.” This server setting automatically disables the CSC.

Unfortunately, the result was that the folders configured for offline caching worked great, but the folders configured for no offline caching only worked until some network error or server reboot. In this configuration, once a path became disconnected, an Offline Files message is logged in the Event Viewer, and even though no files are being cached the entire path becomes unavailable. At that point, the workstation persistently throws Error 0x80070035 any time that particular path is accessed, until the workstation is rebooted.

The only solution I’ve found that works now is to completely disable the Offline Files feature on the workstation. With Offline Files disabled from the Control Panel, the network and server errors are now transient and I am not having any problems with disconnected paths or persistent errors.

Offline Files is ultimately broken and does not improve the Windows experience.

Windows 2012 Can’t Ping NVR Host

I just resolved a long-term problem where one specific Windows 2012 server was unable to ping one specific device on the same LAN.

There were no relevant resources or similar-looking cases on the web. Everything else on this LAN worked normally. The server could ping all other clients, and the clients could ping the server and the NVR. I just could not get the server to ping the NVR for the life of me.

I suspected at one point that this was a routing issue due to my desire for strong security policies around IOT devices. This turned out not to be the case as I could find nothing wrong with the router or any routing tables.

At last, I decided this problem was so specific that it could be a bug in the NVR itself. In this case, the only thing special about the Windows server from the NVR’s perspective was that the server was providing both DHCP and DNS to the NVR. I tried disabling each service, and found exactly what I was looking for.

The NVR will not respond to pings from its DNS server.

I don’t know why this is broken and don’t really care to investigate any further. The workarounds are either:

Create a DHCP reservation with its own option to specify a 3rd-party DNS server, OR
Disable the NVR’s DHCP client and set a static address with an alternative DNS server address value.

In my case, the NVR does not need to use the local DNS server, so this is an easy fix. So long as my server’s IP address is not used in the NVR DNS configuration, everything works normally and the server can ping the NVR.

High Resource Use by Start Screen

While diagnosing what I thought was a Windows Update failure, I discovered unrelated massive resource consumption and file scanning activity apparently tied to the Start screen in Windows 2012.

Symptoms:

10 to 20% constant CPU usage by Windows Explorer.

Rapid file scanning or Shared Folder usage in the case of folder redirection.

Triggers:

Resource consumption begins immediately after opening the Start screen and performing a keyboard search.

Closing the Start screen does not help.

Workarounds:

Sign out the current user. This action will shut down Windows Explorer, preventing the unwanted symptoms until triggered again by a user.

ownCloud Using Wrong PHP Configuration

The ownCloud community dropped support for Windows Server, so I must resort to documenting such problems here instead of contributing open source.

One major symptom that confirmed ownCloud was using more than one PHP environment on my server was the presence of session handler files in more than one directory. Specifically, I was finding orphaned files in C:\WINDOWS\Temp even though my one and only php.ini production file specified a different path as well as garbage collection.

I traced the session file generation as far as the ownCloud calendar “app”, which lives in owncloud\apps\calendar\appinfo\remote.php and related places.

Debugging results were fascinating in that not only was the wrong configuration file loaded, after dumping all phpinfo() to disk I also found that the calendar app was running under an entirely different version of PHP.

The culprit: After the most recent PHP upgrade, my site-specific Handler Mappings ended up with mismatched verb restrictions. Somehow the new version ended up restricted to GET,HEAD,POST by default, while the old version remained unrestricted. Although my handlers were in the correct order to give all *.php files to the correct module, any time a CalDAV client sent a PROPFIND or similar request, IIS essentially downgraded to the unrestricted version of PHP.

The solution: Remove verb restrictions for the ownCloud site’s Handler Mappings, and then remove all but one of the PHP Handler Mappings to prevent any other versions from running without throwing errors.

If you get a bogus error about spaces in “the path to the script processor” when updating verb restrictions, just add double quotes around the path, and then click “No” on the ensuing bogus error about needing to create a new FastCGI application. (facepalm)

Offline Files Access Denied over VPN

I just tried taking a Windows 10 laptop on the road for the first time. Everything was great until I tried the VPN for the first time. Suddenly, I was getting Access Denied errors, and “You do not have permissions” errors for all files made available offline. I confirmed the VPN tunnel and even browsed to other shared folders on the same server. The offline files errors persisted after dropping the VPN.

When I returned to the domain Wi Fi, file synchronization completed normally and there were no errors at all.

Am I to believe that Windows 10 is completely incompatible with VPN synchronization? I never had a problem with this on Windows XP, and I am dreading the months of research and experimentation normally involved in fixing this kind of Microsoft failure.

Shortcode Problems: WordPress 4.4

I will briefly summarize Shortcode API changes since WordPress 4.0 and then kick off some ideas for a roadmap.

The first major accomplishment was the expansion of the API documentation, including a new large section I wrote about the formal syntax for shortcode input.

I also put forward a robust parser concept for the function wptexturize() that promised to re-introduce the ability to use unrestricted HTML code inside of shortcodes and shortcode attributes. That concept went through many, many changes before being introduced in v4.2.3. After consulting with the WordPress security team, and after extensive testing of the shortcode parsing functions, we determined that the shortcodes-first parsing strategy was fundamentally flawed and could not be included with any version beyond v4.2.2. This is why I added an HTML parser to the Shortcode API and ultimately curtailed the use of shortcodes inside HTML rather than expand the use of HTML inside shortcodes.

Continue reading Shortcode Problems: WordPress 4.4

Cookies Not Working in IE10

I’ve finally fixed a crippling bug in Internet Explorer 10 that was preventing me from using any website that required cookie support.

This problem seemed to plague my Windows 2012 server from day one. I’m not yet sure what was special about this configuration. No matter how many settings I changed, every website I visited told me that I had cookies completely disabled.

I used these steps right before the browser started working correctly:

Step 1 – Find the “Delete Browsing History” dialog box.

Continue reading Cookies Not Working in IE10

Shortcode Problems to be Resolved in WordPress 4.1

Illustration of wptexturize_parse() concept. — Achieving correct and exact results in several steps.

When WordPress first introduced its Shortcode API, it included an all-too-simple line of code that was supposed to help curly quotes not appear inside of the shortcode attributes while still adding curly quotes outside of the shortcodes. But there were several known problems with this one line of code, such as what would happen if a URL contained square braces, and what would happen if a plugin author wanted to use HTML inside a shortcode.

In version 4.0, I made a substantial effort to fix these problems, but it resulted in some new limitations being placed on the ways shortcodes could be used. Although I couldn’t find any documented examples or official support for the HTML features, I did hear from several members of the WordPress community who enjoy the full power of customizing their website HTML by using shortcode attributes and HTML values.

My proposed solution is to write a new parser function that will exactly identify the shortcodes and HTML elements being used, so that the function wptexturize() will finally be able to create its curly quotes without interfering with shortcode features. Click on the diagram to see how this new code works.

Continue reading Shortcode Problems to be Resolved in WordPress 4.1