“War is ninety percent information.” said Napoleon Bonaparte in the nineteenth century. This is just as true as ever in the information age that we currently live in. The most successful companies like Google, Facebook, and Amazon, have their revenue model based on information. And they get this information by analysing data. But not only the big fish make money on data: lots of companies benefit from leveraging data to obtain valuable insights. It’s why Clocktimizer has introduced our own Machine Learning Engine to give our customers the edge over the competition.
However, it can be difficult to know where to start with Machine Learning. How much data do I need? How do I collect it? What format should I store it in? In this four part blog series, we will introduce you to the art of data science. You will learn the best practices of collecting, cleaning, analysing, visualising, and interpreting data. We’ll also help you think of the ways Clocktimizer’s Machine Learning can support your law firm. In this first part, we will look at where you can get your data from.
Data? What data?
You may think that you do not have a lot of data at your disposal, or that the data that you do have is not interesting enough. But do not underestimate this: there is probably already a lot of data streaming into and flowing through your business. For most companies it is surprisingly easy to collect important data, and often there is more information in data than you would expect at first glance. You want to search for data that you generate, data that you find in your applications, and data that will answer your questions.
Data that you generate
First, take a look at information that you do already possess but do not actively collect. There is an amazing amount of low hanging fruit in every company, and chance is that you just haven’t got around to recording it. For example, you most likely have data on your clients, for example their location, the matters that are running for them, and how long you have known them. You also have information on your lawyers, like the practice group they belong to, their hourly rate, and the matters they are working on.
The challenge here is not to find the data, but to actually start the process of collecting it. It might seem like tedious work for only little advantage, but even the simplest of data may contain valuable and structural information.
Data from your applications
Second, take a look at all the applications you use during your average work day. Think about Clocktimizer, your time tracking software, something you manage your documents with, your due diligence software, your e-billing systems, or your CRM.
These applications are great sources of data. If you do not already collect this data, you might want to consider to start doing so. It is not very difficult, since most applications already collect data for you in a suitable format, and the only thing you have to do is download it. It may again seem like a lot of effort for little information. But keep in mind that collecting and storing data is very cheap compared to what you can gain from it.
Data that will answer your questions
Third, ask yourself what you would like to know. What business challenges do you have? Which information streams do you need to solve them? What data can give you a better picture of what you would like to know? We will dive deeper into this subject in the next blogs, but if you already have some unanswered questions lying around, then have a little brainstorm on where to find relevant data. If with some effort you can collect information that is important to answer your questions, then go for it.
Searching for particular data might be more work than collecting data that you already generate, or data that you get from your applications. You now actually have to organise the data collection, which can come with some challenges. But keep in mind that you are now working goal oriented: you are not aiming for easily accessible data any more, but for data that contains useful information.
You probably do not know all the questions that you would like to have answered. The next steps of data analysis will give you more questions that you probably have not thought about by now. That is okay: you will simply have to go back to the data collection step, and this is something that you will have to do over and over again. Data collection alternates with data analysis, because each analysis will give you new questions, which then requires new data, which gives you more questions, and then again requires more data. But remember, during this process you will also learn a lot and get most of your questions answered. So it is worth it.
Now that you have collected your data, you can get started working with it. In the next web log we will talk about the principles of cleaning your data, and how to transform it so that you can do some data analysis magic on it.