Ollie Glass

Data science today

2nd October 2018

In just a few years, machine learning has gone from being innovative to being expected in an investment pitch or company strategy. Google CEO Sundar Pichai announced their shift from a mobile-first to an AI-first world, and Amazon has reorganised itself around AI and machine learning. Data science is the fastest growing field in the US job market as companies race to build their capabilities and to launch projects.

These industry-wide trends have reached smaller companies. Even early-stage ventures now produce substantial amounts of data every day. Marketing, customer service, sales, operations, product usage and financial transactions can all be collected and stored cheaply. Outside the company, external data such as the product changes and advertising of your competitors, industry news and statistics are readily available.

Transforming raw data into commercial insight has never been easier. Scientific methods have been adapted to fit industry needs and extend traditional business intelligence techniques, producing lead scoring, price optimisation and customer churn prediction techniques. Software for processing, analysing and automatically taking actions from data makes all of this possible. Once arcane and expensive, data analysis software is now widespread and freely available.

But data science and machine learning projects have a very high failure rate. Ideas seem promising and data scientists start experimenting, but teams struggle to follow through. Projects run on indefinitely without clear goals or deliverables, teams lose direction and momentum, important individual contributors leave. What makes these projects so difficult? How can they be delivered effectively?

As a data science practitioner working with venture backed startups, I'm often running the first data science projects and setting up new functions in my clients’ organisations. This work will have a significant impact on the business, so it’s important that it goes well. I’ve developed an approach to managing projects to ensure they run smoothly.

In these pieces I’ll show how my approach works and give examples for a three-month project. Projects typically take between three to twelve months, and I recommend that clients start small. I’ll describe the tasks and outcomes at each stage, how I manage common risks and challenges, and how I work with stakeholders.

Managing data science projects

To manage data science projects, I divide them into four stages: discovery, research, production and ongoing operation. I’ll give examples of the risks, outcomes and timelines of each stage on a small project. Read more...

Understanding and managing uncertainty in data science projects

Most data science projects require researching approaches and techniques, collecting and processing data, and testing different model architectures and parameters. It can’t be known in advance how long this research will take or the results it will bring. How can you manage this uncertainty when you’re leading a project? Read more..