“War is ninety percent information.” said Napoleon Bonaparte in the nineteenth century. This is as true as ever in the information age that we currently live in. The most successful companies like Google, Facebook, and Amazon, have their revenue model based on the sharing and selling of information. And they get this information by performing analysis on data. But not only the big fish are able to make money on data: lots of companies benefit from leveraging data to obtain valuable insights. It’s why Clocktimizer has introduced our own Machine Learning Engine to give our customers the edge over the competition.
However, it can be difficult to know where to start with Machine Learning. How much data do I need? How do I collect it? What format should I store it in? In this four part blog series, we will introduce you to the art of data science. You will learn the best practices of collecting, cleaning, analysing, visualising, and interpreting data. We’ll also help you think of the ways Clocktimizer’s Machine Learning can support your law firm. In the first part of the series we looked at collecting data. In part 2, we introduced you to data preparation. Part 3 got to the meat of the problem – data analysis. In our final installment we look at best practices for collecting, cleaning, analysing and visualising data.
If you want to show your results to other people, it is often recommended to use visualisation. Not only is it less boring than cold numbers, it is also much easier to spot insights and often connects better with your intuition. There are lots of ways to visualise your data, from graphs to bar charts, or through heat maps to pie charts. Keep in mind that it is not about creating nice looking or complex visualisations. It rather is about summarising your data into something understandable. Be sure to label your axes, include a legend, and add a descriptive title: visualisations should be self explanatory.
If during analysis you encountered unexpected results: congratulations! Now it gets interesting. What did you expect instead? Why did you expect this? And why does the data show you something different? There are lots of explanations for unexpected results. Maybe your data was biased, and you should collect more data, or collect it from a different place. Maybe you changed something essential to your data in the data cleaning step. Or maybe you just found out something about the real world that you did not expect to be true. Be sure to explore all options, and maybe even do a follow up analysis to confirm your thoughts.
There is one issue that we yet have to point out, and that is the issue of overfitting. Overfitting happens when you think you learn a lot from your data, but your new “knowledge” does not apply to the real world. Your “knowledge” is so specific for this exact data set, that it does not generalise to new or unseen data any more. It no longer can give an accurate view of the real world.
One way to know whether you are overfitting is by dividing your data set in two parts. On the one part, you do your data analysis. Once you have reached some conclusion, you test on the second part to see whether it also holds for this unseen data. If not, then most likely you have overfitted your model to the first data set. But if it holds on the unseen data, you can safely assume that you found some general truth about your data.
Continue with the cycle
You probably gained valuable insights during this process. You can now put your new knowledge to use. But it is not over: data analysis never really ends. With answers come new questions. Maybe another great source of data arrived in your business. Perhaps you have encountered new problems. Or you just gained more insight and you now have new questions that require some research. Or maybe you found out that the data that you have does not tackle the problems you were interested in and you need new data. You just finished your first iteration of data analysis, and you can now go back to data collection to start the next iteration. And meanwhile, discover and learn even more.
Machine learning with Clocktimizer
Of course, you do not have to do the data analysis all by yourself. Clocktimizer has developed a machine learning engine that can perform classification tasks for you. All your firm needs to provide is the data and the questions that you have about that data. Clocktimizer’s Data Lab can then get to work performing the actual analysis and classification.
If this introduction to Machine Learning sparked an interest, made you wonder what secrets your data holds, or you’re already on your way with data analysis and want a helping hand, get in touch with Clocktimizer today. Who knows what insights your data holds?
If you feel like you are ready to scale up your data analysis, or if you want some dedicated people doing the data analysis for you, you should look for data engineers and data scientists. In general, a data engineer is good in collecting, moving, and storing data, and will assist you in the data collection part. A data scientist is good at cleaning data, learning from data, and generating insights, and will perform the data clean-up and analysis part.
Keep an eye on our social media for our e-book containing all four blogs coming next week.