Category Archives: Development

Securing Passwords

There are many aspects of web security, but one that seems to need particular attention is passwords. Passwords are perhaps the single the most sensitive piece of information that users should keep private online, even more than personal information such as birthdate, social security number, credit card number, bank account number, mother’s maiden name, or favorite childhood pet, since almost all of those could potentially be researched or else known by at least friends or family members, not to mention that, if any of those are stored by a website, whoever has the password can go in and access them anyway. And since users should go to extreme lengths to keep passwords private, so should websites.

What’s so special about passwords?

Why are passwords so important? Because not only are they the keys to the kingdom of the site they are meant for, giving unfettered access to everything in that user’s account (including all of that other sensitive information), but us being humans struggling to remember the passwords for probably dozens of different sites or services, we tend to re-use the same passwords in more than one place, and usually in combination with the same email or username, too. This means that compromising a user’s password on one site means potentially compromising all of their online activity and accounts, which may give access to virtually all of their personal information and communications.

While it’s easy to say “use a complicated long passphrase that is unique for each site”, in practice we human beings struggle with such a challenge. And while it’s equally easy to say “use a password manager”, the reality is we often have to log in from different devices, and most people don’t use password managers even on their primary device. So that’s where web developers come in, is effectively making passwords unique while maintaining other best practices in their handling.

Recommendations

A lot of these recommendations are basic, but I still run into sites that obviously store passwords as plaintext – they’ll even helpfully email your password to you if you forget it! Haven’t we all heard enough times about site break-ins that exposed the entire password database? It’s happened to countless sites large and small. In combination with the extreme speed of password hash cracking these days, that’s bad news. And of course the many sites using regular unencrypted http connections during login, exposing your password to anyone who happens to be sniffing the network connection somewhere along the line, which is easily done on many WiFi connections.

So here are some rules for handling passwords:

  1. Never transmit a password over an unencrypted connection. If your site allows user sign-ups, even for “casual” accounts with little sensitive personal information, you MUST get a signed SSL certificate (maybe $50 per year) for your site, and not only enable SSL, but also at least strongly encourage, or preferably require, the use of https in order to log in.
  2. Client-side hash your passwords using Javascript. This is actually quite an unusual step to take, largely because it doesn’t do anything to improve the security of your site itself. But what it does do is helps protect the security of the password a user types into the browser with their keyboard. Yes, someone intercepting the password in transit could use it in a “replay” attack to access that particular site, but they could not easily use it to log into another site as the same user.
    • Of course you shouldn’t just hash the password alone, it should be hashed with the unique username (to distinguish it from anyone else using the same password on that site) and a string unique to the site itself (to distinguish it from the same username/password combination on a different site).
    • You need to enable an identical backup hashing capability on the server in case Javascript is disabled in the browser, but for most users while setting their password or logging in, the server (not to mention the network connecting to the server) should never know what password the user actually typed into it.
    • Yes, you still need to hash again on the server (see next), and yes the connection should be encrypted already (see previous), but this is a “defense in depth” measure.
    • Use the best (hardest to crack) hashing function available, but CPU and memory limits, particularly on mobile devices, might preclude using the same method as is used for the normal server-side hashing mentioned next. (Note: since few if any websites use client-side hashing to help protect the uniqueness of user passwords across sites, users themselves can use a tool like this secure password generator, explained here. This code might be a good starting point for client-side password hashing as well.)
  3. Hash received passwords on the server using scrypt before storage or comparison to a stored password verifier. For this, do not use MD5, do not use SHA-1, don’t even use SHA-256 or SHA-512 (though I guess you could use one of the latter with enough rounds of hashing, if you must). Use scrypt, or at worst bcrypt or pbkdf2 (though either of those can be cracked thousands of times faster than scrypt). Bcrypt is designed to run inefficiently even when run across a large number of CPUs; scrypt has the added property of being memory inefficient to parallelize as well as being CPU inefficient.
  4. Keep the password file as secure as possible. Make sure it’s readable and writable only by the proper users, audit access to it, and preferably even keep it on a separate server from the main web server, with physical security, limited functionality, different/limited user access, no direct access to the Internet, and only allowing connections to it from the web server and only for purposes of querying the “is this password good for this username?” service.
  5. Introduce a delay between login attempts if they make too many wrong guesses. Make this delay increase gradually the more wrong guesses they make. For instance, after five wrong guesses, add a five-second delay, then increase it to 10 seconds, then 20, etc. Don’t ever actually lock them out of their account, just make them wait longer periods of time before being able to guess again. (Of course after that time expires without another attempt, roll back the delay until eventually they would have no delay again.)
  6. Encourage strong passwords and forbid especially weak ones. A “strength meter” (color coded and/or numerical/length) can encourage stronger passwords (longer and with more entropy), but there should also be a minimum requirement to accept a password. I’d suggest the following as a minimum:
    1. Require at least eight characters, but strongly encourage more (and allow a large number, perhaps as many as 256 characters).
    2. Require the password contain at least two types of characters (from lowercase letters, uppercase letters, numbers, or symbols), but strongly encourage at least three and preferably all four types of characters be present, as well as more than one of each.
    3. Load a list of the most common 1000 to 10,000 passwords into the browser in Javascript, memory permitting (not on the server side), and compare the typed password against the list to warn the user it’s too easily found with a dictionary attack, perhaps saying “that is the 37th most common password known to password crackers, you might want to choose something more original”.
  7. Institute a solid password recovery process. I think I’ll save this for a separate post, as it can be pretty involved.

Handling security isn’t necessarily easy, but not trying to do so is a grave disservice to a site’s users, even if the site involves little or no personal information, no commerce of any kind, etc. And properly handling passwords is one of the keystones of good security for sites with user logins.

Advertisements
Tagged ,

Opportunistic Encryption and How Browsers Handle Certificate Problems

As a follow-up to my last post on improving privacy on the Internet, I ran across the concept of opportunistic encryption, which I’ve heard about before but has never seemed to go anywhere.

Opportunistic encryption seems most interesting at the TCP layer, so that it is transparent to not only the user, but to applications that use the network as well. However, there are technical challenges to successfully implementing it without introducing undue complexity or noticeable reductions in performance. Such schemes have also never been accepted by a standards body, so their chance of widespread adoption seems slim (though you can try one such scheme, TCPCrypt, already; however, it requires the other end of your communication to have TCPCrypt installed as well, which seems unlikely in most cases).

Thus, as I noted in the last post, web and email seem to offer the best opportunities for adding encryption that’s transparent to the user.

How web browsers handle encryption problems

This leads us to https, the security and privacy protocol for web browsing. As I said previously, we’d like to encourage as many web servers to support, and preferably even mandate, the use of SSL/TLS for web browsing. And the web developers, systems administrators, and internet engineers our there can certainly help make that happen.

But there are lots of things to get right when implementing web security. Getting them wrong can make you susceptible to various kinds of attacks, mostly based on some form of of man-in-the-middle. That’s why browsers go to such lengths to warn users about problems, often denying access to the site if a problem is detected, until the user explicitly overrides this warning.

But is this the right behavior to take? Is badly-configured encryption really worse than no encryption at all? Web browser vendors sure seem to think so, but I disagree. While a misconfiguration such as a mismatch between the domain named in the certificate and the actual hostname may be a sign of a man-in-the-middle attack, in my experience it’s almost always due to something else. Similarly, self-signed or expired certificates are extremely unlikely to indicate a man-in-the-middle attack. And while none of these situations is ideal, they are all almost always far better than having no encryption at all.

Undesired behavior

So what actually happens when a server has a misconfigured certificate, and the browser throws up a big warning? Either the user can ignore the warning (which is potentially dangerous, but actually fine 99% or more of the time), they can switch to insecure http (which is, at best, the same as continuing with the untrusted encryption, but much worse the vast majority of the time), or they can discontinue using the site entirely, which hurts both them and the business, and is usually unnecessary since the chances of it being an actual man-in-the-middle attack are slim.

When the operator of the site sees the problem, they may choose to fix it – but they might just choose to disable https instead (and aside from e-commerce sites, I’d suspect the latter is more likely, at least in the short term). Yes, they should fix it, but more often than not they are not going to.

The net result of these browser warnings is scaring and confusing users without increasing their security, since between the users and the website owners, the most likely course of action is to either ignore the warning and proceed (which browser vendors have combatted with ever more dire and difficult to bypass warnings), or to revert to the even-worse unsecured http.

False sense of security

But at least from the point of view of opportunistic encryption, encryption using an expired, weak, self-signed certificate is vastly preferable to no encryption at all. The only danger is providing a false sense of security. But browser vendors have done exactly that by turning everything on its head, by making totally unsecured connections seem preferable to many sorts of encrypted connections, since the unsecured connections do not throw up warnings in the browser!

We need to encourage the use of https connections on the Internet, and part of encouraging its use means not discouraging it where the implementation is not perfect. While we should encourage proper implementations most of all, we should also encourage opportunistic encryption as better than no encryption, even if we aren’t guaranteeing privacy or integrity in the face of man-in-the-middle attacks (which take some effort and are quite rare in the grand scheme of things).

How to fix this?

The fix should actually be simple: change how web browsers communicate to users problems with how encryption is implemented. And most of all, how that communication compares to how it handles totally unencrypted connections.

I propose a “sliding scale” of perceived security. In the browser bar, the scale could be represented by a range of colors and icons, as follows:

  • UNENCRYPTED: Non-https connections would always be highlighted in red. Use of “null” encryption ciphers would also put a connection in this category. In addition, I’d suggest a “bullhorn” or similar icon to communicate that you are broadcasting your activity to the world (a typical radio broadcast icon could work too, but could be confused with wifi). When clicking for more detail, it could warn the user as follows:
    • THE BAD:
      • Your connection is unencrypted. Anyone on the Internet could listen in and see what you’re doing, including viewing your password if you are logging in, could modify or replace the content sent between you and the server without your knowledge, or could be logged in as you and have full access to your account.
  • INSECURE ENCRYPTION: This would be used for various kinds of encryption which have problems that could leave them susceptible to or be a sign of a man-in-the-middle attack, such as self-signed certificates, revoked or long-since expired certificates, or certificates for a domain which does not match the hostname, but where the encryption is still useful for opportunistic encryption and protecting from casual observers. Use of particularly insecure types of encryption (weak or compromised ciphers such as “export” ciphers, too-short key length, etc.) could also contribute to showing up in this category. These should be signified by a broken or unlocked lock icon. Clicking for more detail could notify the user as follows:
    • THE BAD:
      • The certificate used by this site is [unsigned/signed for a domain that does not match the actual hostname/expired/revoked], and thus does not guarantee protection from a man-in-the-middle attack. (Along with more detail, such as a comparison of the domain name for the certificate with the actual host name, the date the certificate expired or was revoked, and a note that certificates could be revoked due to knowledge that the encryption keys have been stolen or misused.)
      • (possibly) The encryption in use is considered weak enough to be easily cracked in a reasonable time by “brute force” methods.
    • THE GOOD:
      • Your connection is encrypted, so your activities cannot be viewed by casual observers monitoring traffic on the Internet.
      • Man-in-the-middle attacks take some effort to mount and are fairly rare, so most likely your connection is secure and the warning is due to a much more mundane misconfiguration; however, there is no way to guarantee it.
  • SEMI-SECURE ENCRYPTION: This might have some kind of closed or almost-closed (maybe closed, but with a crack) lock icon. It would be a variant of the above, but where the “misconfigurations” were considered more minor, such as:
    • Signed for a subdomain that doesn’t match the hostname exactly, but shares the same overall domain name. For instance, a certificate signed for “users.mysite.com” would be considered semi-safe if used on “www.mysite.com” (or any other *.mysite.com), even though it’s not an exact match.
    • Recently expired, for instance within the last 90 days.
    • Encryption that may have some weaknesses, but is considered secure against anyone short of the NSA, and probably not super easy for even the NSA to crack in a reasonable time and on a wide scale.
  • SECURE CONNECTION: This would be used for connections that are considered fully secure: a properly signed (by a trusted certificate authority), unexpired and unrevoked certificate which matches the hostname. The connection should also be using the strongest cipher suites available. These would have a closed lock icon. Clicking for more detail could notify the user as follows:
    • THE GOOD:
      • Your connection is encrypted, so your activities cannot be viewed by observers monitoring traffic on the Internet.
      • The certificate used by this site is properly signed by a certificate authority, is not expired or revoked, and matches the hostname it is signed for, protecting you from man-in-the-middle attacks.
  • Extended validation: Much is made of extended validation certificates, which verify more information about the identity of the site using the certificate, and in the case of e-commerce it may make some sense to help trust who you are giving your money to. But I think they are more a means to increase profits for the certificate vendors, and I think the visual differentiation they are given is wholly unwarranted. Even a site with an EV certificate could take your money without shipping you the product you ordered, charge more than agreed, sell your information to others, or otherwise cheat you; they could also be just as likely to allow NSA access to their private encryption key (either through cooperation or hacking). And most sites without EV certificates are probably perfectly trustworthy even if they didn’t bother to pay 10x as much to get their certificate. However, it could add a green checkmark across the lock icon and an additional benefit to the “Good” category when clicking for more detail:
    • THE GOOD:
      • Your connection is encrypted, so your activities cannot be viewed by observers monitoring traffic on the Internet
      • The certificate used by this site is properly signed by a certificate authority, is not expired or revoked, and matches the hostname it is signed for, protecting you from man-in-the-middle attacks.
      • The domain for this website has undergone extended validation of the identity of its owner.
  • Forward secrecy: Using ephemeral cipher suites to achieve “perfect forward secrecy” is also highly desirable, and such sites should be differentiated with an even more secure-looking icon (or at least sparkly/magical/happy-looking) and an additional benefit:
    • THE GOOD:
      • The encryption keys change each time you connect, so gaining the master keys will not allow an attacker to see your past or future activities.
Tagged , , ,

Three Ways Web Developers Can Improve Internet Privacy

With all the revelations about out-of-control government spying on the Internet, a great deal of attention has been paid to:

  1. Political changes, such as new laws and legal interpretations. This, of course, is at the core of the problem – what they’re doing should not be legal, or if it’s already illegal, more effort should be made to notice when it’s happening and stop it, and somebody should be getting in trouble for doing it. However, there will be a lot of resistance to this, and change will take a lot of time and likely be incomplete.
  2. “NSA-proof” privacy solutions, such as end-to-end encrypted email or chat, or using TOR to browse the web. While no solution is really “NSA-proof” in the end (especially if they target your actual computer), a lot of solutions can come reasonably close. But end users often find such solutions inconvenient to use, or may not even be aware of them. Worse, they may not feel they have anything to hide from the government or are skeptical they’d be targeted for attention; indeed, we are aware that using such tools explicitly DOES single you out for attention from three-letter agencies.

These approaches are not only laudable, but critical – they are necessary to protect against determined, focused attacks by three-letter agencies. But there are many other things that can be done to protect against casual “hoovering” of information on the Internet. Part of the problem is simply this: it’s too convenient to access most information by casual listening, because there isn’t even a pretense of privacy or security when information is transmitted without any encryption at all. This leaves a very large amount of internet traffic unencrypted for them to sift through without needing to crack or otherwise bypass any form of encryption.

But what if we made encryption the default for more traffic? While it would still be feasible for the NSA to crack or bypass much of that encryption when they really wanted to (by hacking your computer to install a key logger, for instance, or requiring a service provider to hand over your data), merely enabling encryption where it is currently missing could vastly reduce the amount of unencrypted traffic flowing through the “pipes”, meaning it would cost a lot more to sift through, while also making it more difficult to target encrypted traffic for special treatment as “suspicious activity”.

Most encryption beyond whatever happens to be enabled by default turns out to be too difficult for most users to deal with. We also can’t control what access the government has to Google, Microsoft, Yahoo, and Facebook that bypasses the https connections to their servers. But as engineers working on all the other websites and servers out there, we do have control over a lot of other things.

There is much that can be improved: security and privacy on the Internet are shockingly bad, and not just because the NSA is really good at their job (though part of their job is supposed to be strengthening our cyber-security, a task I believe they are failing at). A lot of this is caused by laziness on the part of developers, sysadmins, and internet engineers, as well as a lack of understanding, priorities, or budget from managers.

But many of these changes don’t really take that much time, and aside from that, often the only cost is that of a signed SSL certificate, available for as low as $50 per year.

While there are many security tips for how to lock down your server and network, here I will only talk about simple steps you can take to increase the “background noise” level of security and privacy of communications over the Internet. Here are some suggestions:

  1. Enable HTTPS/SSL on your web server. I’ll talk about this more below.
  2. Enable TLS for SMTP on your mail server. While it is probably not feasible to force the use of TLS at all times (many mail servers may still not support it), at least enabling it on yours increases the odds of email transfers between servers being encrypted.
  3. Disable FTP and telnet in favor of SFTP and SSH. You don’t want to be talking to your server or transferring files over non-private connections when there are secure alternatives that are just as easy to use.

These three steps, taken by the administrators of many sites around the Internet, could end up encrypting a large amount of traffic that is currently sent as plaintext.

Enable HTTPS/SSL on your web server

This is perhaps the most obvious one, as the web is probably the biggest activity people use the Internet for and whether a site is secure or not is immediately visible to users.

What does it take?

  1. Install a certificate and encryption keys. In order to protect against man-in-the-middle attacks, this should be bought from a legitimate certificate authority, rather than using a self-signed certificate. However, aside from e-commerce sites, where there’s extra value in trusting who you’re about to give your credit card number to, there’s not much benefit to so-called “Extended Validation” certificates aside from more profit for the certificate vendor.
  2. Enable port 443 on your web server, referencing the keys that were installed in step one.
  3. Make sure your web pages work properly over SSL, most particularly that they don’t include any insecure content that would trigger “mixed content” warnings in the browser. This includes CSS and JS files, images, and background images referenced from the CSS.
  4. Make your SSL as secure as it can be. This includes:
    1. Using at least 2048-bit encryption keys.
    2. Enabling “perfect forward secrecy” by enabling the needed “ephemeral” cipher suites and making their use preferential, as well as making sure TLS Session Tickets are disabled.
    3. Disabling weak cipher suites, such as anonymous, null, or export ciphers, as well as avoiding Dual_EC_DRBG, which appears to have been “back-doored” by the NSA.
    4. Protect against BEAST and CRIME attacks by upgrading to TLS 1.2, de-prioritizing vulnerable cipher suites (unfortunately there is no clear approach that works in all situations), and disabling TLS compression.
  5. Make encryption mandatory by implementing a global 301 or 302 redirect from port 80 to the same URL on port 443, and updating all your internal links to reference https.

Idea for Google: Tools for authors to make their text more translatable?

EDIT: I just did another search, and finally came up with some reference to a very similar idea here (“Dialog-Based Machine Translation”), though the implementation appears to be somewhat different: http://wam.inrialpes.fr/publications/2005/DocEng05-Choumane.pdf

Several differences:

    • They’re talking about integration with a machine translation system per se; I’m talking about pre-tagging the source text to make future automated translation easier (though providing round-trip access to Google Translate or similar would be a very helpful adjunct part of the tool, to know which parts of the document need disambiguation).
    • They talk about maintaining a parallel document of some kind, using tags in the source document to reference it; I propose that it would be simpler to maintain only tags directly within the source document, and that this approach would also make later automated translation of full web pages (integrated with other styling etc.) easier.
    • They talk about the system telling the user when it’s confused and asking questions, which it then maintains in an “answer tree”; I propose that authors won’t have access to that information, and will just need to review the round-trip translation to understand where confusion is arising.
    • Besides which, theirs is just an academic paper; if it’s been implemented in some commercial product, I doubt many people (aside from professional translators) are using it. I want this to be an extremely widespread, cheap mechanism that any website could use.

Original post:

I’ve had this idea for a while (least a few years now, maybe 5 or more? I’d have to recover some old computers to see when I first noted it down).

The basic idea is this: provide a way for authors of online material to “tag” their texts with disambiguation information that would help translation engines more easily glean the meaning of the original text. Some advantages of such a system:

  • No knowledge of other languages would be needed for an author to improve the translatability of their text.
  • The tags, once entered in the source language, could ease the automatic translation into any and all target languages.
  • It would not require actually changing or rewriting the source text – just tagging with additional information.
  • Any translation engine that understands the standard could take advantage of the additional information. Human translators could benefit from the additional information as well.

Specifically, provide a syntax (probably for xml and html, such as spans with appropriate “data” attributes) for tagging groups of one or more words within a text with disambiguation information. This could be of the form of: a code for “proper noun, don’t translate”, a reference to a specific meaning (“noun, sense 2” – though probably in the form of a unique identifier for that particular entry) within a specified online dictionary, a reference to an idiom dictionary to define a phrase, a reference to another word within the sentence (“he” refers to “Chuck”), etc.

The translation engine (any translation engine) could then take advantage of this embedded metadata to better understand the source text’s meaning, and thus translate more accurately into other languages.

The implementation on the authoring end might be a text editor with a “translation helper” plug-in. The author could select text (one or more words), and use the translation helper to add a disambiguation, such as selecting “proper noun”, selecting the appropriate dictionary meaning (which the helper would look up automatically based on the selected word), search for an appropriate idiom, enter a replacement word or synonym, etc.

This could be supplemented by a “round-trip” translation tool, which translates the text into a selected target language, then back into the original language. Authors could then concentrate on areas that produce the most confused output – that is, we don’t want them to have to laboriously tag everything on the page, just the problem areas. Similarly, they could start with one language, then check in other languages to see if additional disambiguation is needed.

As time goes on and translation engines like Google Translate get “smarter” at gleaning meaning from context, the need for such tagging might be reduced. But in the meantime, it could also help with machine learning, ie, the translation engine guesses, then compares its guess to the entered tag to see if its guess was correct.

Again, the key difference here compared to “assisted” translation systems is that the “operator” needs no knowledge of any but the source language. This isn’t about providing hints to help translate into any particular language, but rather hints as to the meaning of the source text.

On the other hand, I wouldn’t want this to get bogged down in more-general efforts to promote the “semantic web” for uses other than translation. Therein lies a burial in the bowels of an obscure W3C proposal or RFC.

There are several reasons I think Google should be the main drivers of such a standard:

  • They are the biggest online translator (as far as I know), and hence would be the largest user of the resulting data.
  • They are capable of providing and hosting the needed online “tagging” dictionaries for authors to reference.
  • They are capable of driving ad-hoc standards like new HTML attributes (see ‘rel=”nofollow”‘, “canonical” meta tag).

In fact, since I am not interested in (or capable of) implementing machine translation engines or online dictionaries myself, I don’t see how this idea can go anywhere without Google. (Yes, I know about Yahoo’s Babelfish and Microsoft’s Bing Translator, and that there are others, but I think they are all bit players compared to Google, and also not the ones to drive a standard. And I’d rather see this implemented quickly than debated in committee for the next ten years.)

Google Translations team: does this sound at all interesting?

Tagged , , , , , ,

Drupal: a way forward by focusing on the right layer – and the right user

Dilemmas
There is much talk nowadays about Drupal needing to strip things out of core to make it more maintainable. I agree with the sentiment.

On the other hand, there are many people who feel that the process of moving frequently-used contributed modules into core should continue, since these are things that every site builder needs for every site. I agree with this sentiment, too.

There is also much talk about Drupal’s usability, and that it needs to be improved. I agree with this sentiment as well. (Everyone does.)

However, moving more things into core won’t help with the problem of maintainability that core developers complain of. But moving things out will hurt Drupal’s usability and usefulness for site builders. These two goals are largely at odds with each other, though they both need to happen. How do we resolve this dilemma?

Similarly, focusing too much on usability for “novice” users won’t help site builders much. It also won’t help novice users if things like Blog and Forum are moved out of core (as they should be). No matter how “easy” Drupal becomes, these users will still be thoroughly confused by all the concepts involved. Simply put, Drupal is not the system for them, and Drupal should not focus on their needs. Yet we do want to accommodate such users. How do we resolve this dilemma?

Solutions
I believe that in order to resolve these dilemmas, people need to start thinking about Drupal differently. Otherwise, if we go too far along any of these paths, Drupal will continue to be a confused mess that doesn’t fit anyone’s needs very well.

When I look at Drupal, I see three layers corresponding to what I feel are the three main audiences that interact with it:

  1. Code/API/Framework layer: this corresponds to core coders, module developers, and experienced programmers building highly-custom site solutions in code (ie, writing their own custom modules)
  2. Site Building layer: this corresponds to the typical site builders – fairly experienced web developers who have a good grasp of concepts like using modules to extend site functionality, tweaking the site by customized settings, differentiating different types of content, defining content types to include discrete fields of information, and using blocks, Views, layouts, etc. to show those chunks of content variously in different places on the site.
  3. Complete Solutions layer: this corresponds to end-user site owners and less experienced developers. Basically anyone that wants to put up a “blog” or a “forum” or a “gallery” or a “brochure site” for their business with the minimum of effort (push a button), and doesn’t understand (or want to have to understand) the concepts in the previous layer.

The first layer is important – it needs to be available to build highly-customized, high-performance sites with special needs. It’s also what all the other layers are built on top of. But this layer should not be the primary focus of Drupal. Why? It’s not where Drupal shines above all others – if this is the target, then the competition is all the other frameworks out there – Symfony, CogeIgniter, CakePHP, Zend, Ruby on Rails, Django, etc. Is Drupal really better at that kind of site-building than all these other frameworks? Maybe, but even if so, is that the battle we want to focus on?

The third layer is also important – there is a huge audience of people out there who want to be able to throw up various kinds of sites with a “1-click” installation, and Drupal should be able to meet their needs. But this layer should not be the primary focus of Drupal. Why? First, it’s not where Drupal shines above all others – Drupal (even with the Blog or WP Blog module) is not a better out-of-the-box blog platform than WordPress. Advanced Forum is probably not a better out-of-the-box forum than various other forum-specific options, etc. Further, even if these are included in a fairly easy-to-use installation profile, they’re still never going to be easier to install than a hosted solution like WordPress.com or Tumblr, where no installation is required at all. This is a battle we simply cannot win, and are already coming from a position of considerable weakness.

No, it is the second layer that is Drupal’s strength, and where the nexus of development and usability should be focused. For this brings us to the next big problem when it comes to moving Drupal forward: a lack of focus. To put it bluntly, the Drupal community that discusses usability and what should be in core has not decided who its audience is (or has perhaps picked the wrong audience). And without a clearly-defined audience to focus on, the likelihood of meeting the needs of any audience is low.

So how do we go about resolving these issues?

  1. Framework/API/Code level:
    1. Strip this down as much as possible – it should provide only the low-level, fundamental services needed to build the rest on. It should certainly not include any complete solutions like “blogs” or “forums” (or even “books”).
    2. Re-use as much code from Symphony2 as possible – if Drupal doesn’t absolutely NEED its own unique version of this sort of code, don’t write it. Let somebody else maintain this code so Drupal core developers can focus on the things that make Drupal unique.
  2. Site builder level:
    1. Figure out the key capabilities supplied by common contrib modules, and move them into the base distribution (much like CCK -> Field API). Newsflash: if 90% of Drupal developers use the same 10-20 modules on EVERY SINGLE SITE, then those modules ARE de facto core, whether you like it or not. Drupal without these modules is not a useful site-building system for most developers. These modules might include things like:
      • Date, Link and other useful field types
      • some subset of Display Suite and/or Panels
      • Fieldgroup
      • Better Formats
      • some kind of WYSIWYG/image handling (I like WYSIWYG+CKEditor+CKEditor Link+IMCE)
      • Meta Tags
      • Node Hierarchy (since Book is too specific and poorly-named, and menus/taxonomy just don’t work as it seems they should for building hierarchies)
      • Token
      • Pathauto
      • Views
      • Webform
      • Workbench or some other workflow solution
      • XML Sitemap
      • etc.
    2. Focus your usability efforts on this audience and these capabilities. Drupal is still pretty darn confusing even for experienced site builders – partly because so much needed functionality IS supplied by a variety of contributed modules, each of which may do things its own way.
  3. Novice/end-user site owner level:
    1. Clearly relegate this to installation profiles and separate documentation for this audience. It’s gotta be done – no more whining about how installation profiles might not end up being adopted or meeting needs. No other approach to “fixing” Drupal is going to either. Also stop whining about how you might miss out on millions of non-technical newbies who might go to Tumblr.com instead if you don’t make this audience part of your core. They’re going to do that anyway. Drupal isn’t right for them, and persisting in thinking it is is distracting you from fixing it for your real core audience.
    2. Stop doing usability studies of this audience (except with reference to improving those installation profiles and the associated documentation). It’s like asking how you can make the mathematics of quantum physics more accessible to English majors. Stop doing that. Please, just stop. It’s a waste of time and you’re losing focus on what really matters.

The Way Forward
First, figure out who your core audience is. How? Figure out where you have unique strengths compared to your competitors. Now, focus relentlessly on meeting the needs of that audience.

Ultimately, I think site-builders are the core audience of Drupal. Thus I think Drupal’s core developers and UX team should focus relentlessly on meeting site builders’ needs. To avoid distractions, the low-level core code should be stripped down and migrated to relying on an outside library/framework (Symphony2) as much as possible, and the focus on non-technical end users should shift to a separate project of installation profiles and documentation built on top of Drupal, rather than being part of Drupal itself.

Otherwise I think Drupal, as a project and a community, is going to have a very difficult time meeting anyone’s needs well, and may even collapse under its own weight.

Tagged
%d bloggers like this: