Data Science Certificate case studies: David Tudor-Griffith

Data science

David Tudor-Griffith was part of the first IFoA Data Science Certificate cohort. In this blog he tells us about his experience of taking part in the course and what actuaries can gain from it.

Over the last few years I had become increasingly interested in data science. I'm an independent actuarial consultant in the life insurance and pension sectors and part of my interest in this was the need to keep my skills up to date and relevant.

In terms of study, I'd worked through many courses on DataCamp and completed a Microsoft certification to give me the background in the many different areas. I'd used R in some of my roles, which was a useful practical application of what I had learned. As a result, this course seemed ideal as it was a chance to gain a formal certification that recognised my interest in this area and covered the interaction with actuarial work.

I was pleased to be able to enrol for the first sitting – it sold out in days as there was a lot of interest.

In between signing up and the start of the course the pandemic hit. This meant lots of disruption for us all and, in my case, some home schooling needs alongside my work commitments. However, the course is very flexible and I was able to schedule my study around all of this. There are no fixed hours to attend, and while the seminars are live, any that you might miss are recorded so that you can access them at your own convenience. In total, the course lasted around 10 weeks. The good news is that there are no exams either. Three assignments are used for assessment, which are submitted online throughout the course.

The deadlines for the three assignments gave around two-three weeks for each one. The learning materials were very helpful and there was a two-speed structure to the course. There was a core element, which is what you needed to cover to complete the course and which isn’t intended to require technical knowledge. In addition, there are optional exercises, often focused on a particular technical area (eg Tableau or Python), which provide a bit more depth for those interested in that area.

This course is designed to be an introduction to data science and to allow actuaries to be able to talk about data science concepts with more confidence. It’s not an in-depth qualification that will turn candidates into machine learning or Python experts. It’s intended that actuaries who develop an interest in these areas will use the course as a springboard to develop their skills further.

What strikes me is that data science is a hugely diverse area; it’s not just building machine learning models, which is typically what gets most attention. Actuaries do have a background in some of the data science areas, such as structured data manipulation, modelling and using data correctly. However, there are some areas which actuaries will not be familiar with, such as cloud computing, unstructured data or natural language processing. It's valuable for actuaries to have some understanding of what these newer areas are, as they have the potential to impact our work in the future.

There are some areas of the course that are easy to apply to everyday actuarial work, such as data visualisation, good data management, and data ethics. There are also some very practical aspects, such as making graphs a bit cleaner, applying more rigour to our data setup, or using Excel features such as Power Query to import data. One of the tools covered was OpenRefine, which can clean data far more effectively than doing so manually, can save hours of work, is open source, and very easy to use.

Alongside my experience in data science so far, this course has shown me not just how data science can interact with actuarial work, but also how they are different skillsets. While some people may be able to gain the skills to be a data scientist and an actuary, the two roles are distinct and often complementary. In general, actuaries are not likely to have the deep technical coding and computing background for building very intricate machine learning models; similarly, data scientists are unlikely to have the background in insurance, the experience of making judgements where data is sparse, or the professionalism requirements. It could well be that over time actuaries hand over some of the data and modelling tasks to data scientists, who could do this more efficiently than actuaries when dataset sizes are large.

The course is intended to motivate actuaries to continue to learn more and develop their data science skills. The two main areas that I will continue to learn about are R coding and working with large datasets. Large datasets seem to be more and more common, and using Excel to do this can cause problems due to spreadsheets being slow to calculate and update as well as being difficult to review and use in a controlled way (witness the problem that Public Health England had back in October when 16,000 cases of confirmed coronavirus infections weren’t logged due to Excel reaching its size limit). Data science teaches the importance of defining a data schema (basically a proforma of what the data is), version control and structuring data in the best way and this is important for every actuary who works with spreadsheets in some way (which is nearly all of us!)

In summary, this course is a very good introduction to data science, and is particularly suited to those new to data science and who want an overview of this area. However, even those who may have some experience of data science can still learn plenty if they study the extension materials for the exercises using Python, Tableau or OpenRefine. As AI and machine learning become more commonplace, having a basic knowledge of these areas will become even more important for actuaries – and this course is a great way to learn more.

In 2020 the IFoA, in collaboration with Southampton Data Science Institute, launched its Certificate in Data Science. The IFoA has long realised the increasing importance of data science but this was the first formal qualification that it had launched in this area. It is not part of the syllabus for those studying any of the actuarial qualifications such as the Associate or Fellowship route. It is a standalone certificate that acts as an introduction to the area of data science.

Topics