Reasons Why Many Data Science Projects End in the Lab

Not all data science projects are successes. If a predictive model does not get deployed and used, it can mean months of lost effort, at least tens of thousands of pounds in costs and missed opportunities. Here are some of the reasons models do not reach production.

Out of Time

How long can you afford to let a project overrun before hitting a stop-loss? If the period from business objective to actionable insight is too long, the opportunities you need to realise may be lost forever.

Key People Leave

Most traditional machine learning projects take months or even years to come to fruition, much longer than the notice period afforded to personnel in a typical organisation. With data science being such an in-demand set of skills, retaining data scientists can be crucial. Automated machine learning protects your project, and your intellectual property, by dramatically shortening the time it takes for a project to complete successfully, as well as the learning being invested more deeply in the platform and democratised across all of the stakeholders.

Models Are Not Accurate

Data science can feel just as much a “data art”. Hundreds or even thousands of algorithms compete for the attention of the data scientists as a potential solution to the business objective. Given limited time and resources, and without automated machine learning, only a small number of these can be selected for trial within budget and time constraints. Even if the best algorithms are chosen, there are an impossible number of permutations of settings and hyper-parameters and means to measure the effectiveness of the model. Automating the selection of modelling approaches and running leader boards, pitching algorithms against each other in heats, is a reliable and cost-effective means to hit on the models that offer highest accuracy in minutes or hours, not weeks or months.

Asking the Wrong Question

No matter how good your data science programme is, if you ask it to solve the wrong problems, you will not maximise the returns on your investment. Perhaps the translation of the business objectives are not always clearly articulated to the data science team? Transparent modelling approaches from automated machine learning, empowers a wider range of employees to become citizen data scientists, bringing the solution closer to those who really understand the goals.

Poor Data

Machine learning needs lots of data. Not just thousands of rows, but also features that are pertinent to the problem domain. The success of any predictive modelling project depends on having sufficient volumes of fresh data, cleaned and enriched, ready for training.

Insufficient Consideration of Deployment

Software development stakeholders know just how valuable Dev-Ops has been to reliable delivery of software projects into the hands of the users. Data science projects are often lop-sided with not enough thought up front and throughout on how the final models will be used. This deficiency can lead to rushed, bungled deployment and inappropriate scoring processes, such as batch processing where real-time is needed. Projects that do not adequately consider the entire life cycle - from business object through modelling to model insights, behavioural testing, deployment and monitoring - may fall at the last hurdle, where failure is most costly.

No Executive Sponsorship

The odds of success are stacked in favour of programmes with full buy-in from all of the stakeholders. Funding, resource allocation, flexibility and clarity of goals all contribute towards actionable insights and strong ROI.

Too Much Autonomy

Giving trusted stakeholders the freedom to indulge their ideas to deliver the best solution can really benefit a project, but it may harm the programme. Finding the right balance between autonomy and homogeneity is necessary to trade agility and best-of-breed against ability to scale processes and technologies across the enterprise. Before allowing every project to run its own stack, consider how deployment, monitoring, support and maintenance will be impacted.

Echo Chambers

When hiring data scientists, if you have much freedom of choice, try to throw in a few wildcards. Ideally, they should all share a degree of overlapping knowledge, but each should be able to bring something different to the table. There are hundreds, maybe thousands of applicable modelling approaches at their disposal and even the best data scientists will not be able to consistently apply each wherever it is best suited. Robotica uses DataRobot automation to pitch modelling approaches against each other, drawing from over 600 algorithms, to combat cognitive boundaries.

Not Invented Here

Do not let the egocentric ideal of only apply technologies and algorithms invented within the organisation hold back your ambitions. Borrow the best from the open source world, as well as cutting edge proprietary and in-house stacks to improve your ability to generate accurate predictive models and deploy them into production.

We Only Know R/Python/MatLab

If your data science programme is constrained to only a small subset of modelling approaches, the accuracy you can expect from your models will be impeded. DataRobot destricts you by applying models from R, Python, Tensorflow, H2O and many more.

Opaque Models

Predictive models which act as black boxes - providing results but with no explanation as to how they are derived - have limited value in the modern business world. Regulatory requirements, data protection laws and good ethics are all drivers that demand transparency in the modelling process and in how individual predictions arrive at their answer. Decide early on if models with no ability to explain their “thought process” are a good fit for your organisation.

Automation from Robotica Machine Learning seeks to increase the likelihood of success by tackling all of these common shortfalls of traditional ML projects.
Request a demo