What are decision trees?
Decision trees fall under the category of supervised machine learning and are generally a go-to algorithm for classification, depending of course on the use-case and the data. Decision trees work by statistically discriminating between attributes of a known object. Starting at a root node, decision tree branches are created depending on the criteria – the most basic example of this is a binary split. If you’re like me and learn better with visual aids, consider the following:
Basic example of a decision tree
As you can see, decision trees use ‘if, then’ logic. In practice, decision trees are more complex than the above image, where the ‘YES’ and ‘NO’ decisions are replaced by statistical probabilities. So in reality, the above example can even be misleading because the decision split rarely has an exact 0.5 probability of occurring. And this makes sense, because what would be the point of relying on a machine learning algorithm that is only as good as flipping a coin (not really how it works, but it’s a simple analogy – the decision part isn’t ‘random’ like flipping a coin)? Decision trees often will have many branches and end nodes, exactly how many will be determined by the number of attributes the object has. This increase in nodes (or attributes) has an accumulative effect and can very quickly become very complex. If you keep decision trees at the most basic, say 2 exiting branches for every node, it is still exponentially increasing the number of nodes at each ‘level’. One way of tackling this complexity is to use random forests. Random forests are many decision trees joined together to form, as the name suggests, a forest. I personally prefer to think of random forests of jungles because they are not very uniform – and in my mind jungles are the antithesis to uniformity that is more often found in forests.
What on earth does all this have to do with conversations and chatbots? Think about what happens when you have a conversation with someone – no I’m not talking about that chatterbox who never lets you get a word in edgeways, but a real conversationalist. A good conversation is a back-and-forth, where one person talks and the other listens and then switch roles. It’s what happens when the roles are switched that is important here, because at that point a decision is made about what to say next. That decision is based on the last thing the other person said. While decision trees work well with humans (they can, within reason make this decision very quickly), it is not so effective when a computer is behind the wheel. In order to have a ‘conversation’ with a computer, a chatbot for example, the logic of decision trees becomes too computationally demanding to be effective. Decision trees can be effective, but only in certain use cases such as Frequently Asked Questions (FAQs) or simple question-answer pairs. These uses are easily managed through buttons a user presses to progress in the flow. Using buttons to navigate through a conversation would be a terribly awkward experience. Arguably this wouldn’t even qualify as a conversation. A user needs to be able to type at the very least. Having the ability to type to a chatbot vastly increases interactivity, providing a more ‘natural’ flow.
Support Vector Machines (SVM) have entered the chat. SVMs are supervised machine learning models. As their name suggests, SVMs discriminate between vector points (the individual data representations). The distance between vectors are measured and if the model determines there are more than one distinct class, these classes are split across a ‘hyperplane’, a virtual line in the sand. The final hyperplane is the most optimal from various possible hyperplanes where the maximum distance is achieved from the two closest data points from either side of the hyperplane. These are referred to as the ‘support vectors’. Much more is involved in SVMs, but I want to make this easily digestible – it is after all only a blog post. As with most things, visualising makes it easier to understand:
Example of an optimal hyperplane, Source
Botpress utilise SVMs because of their suitability for text and natural language processing. By default, node-svm used by Botpress uses a k-fold cross validation of 4, which I feel would strike a good balance between performance and model accuracy. This validation process splits the entire dataset into 4 parts. These 4 parts are then randomly assembled, 3 of which are used to train the model and then tested using the 4th. This cycle is repeated until each single part has had a ‘turn’ in testing. Again, visual aid from Wikipedia (showing a 3-fold validation example):
K-Fold Cross Validation, Source