Data science and data product development
Contract data-scientist / developer working end to end from idea to implementation, building data products for sales, marketing, decision support and operations.
Feel free to email me, firstname.lastname@example.org
I designed and built a machine learning system to rate image quality and factor it into search results, giving Picfair a market-leading photography search engine.
The machine learning system is deployed into production as a scalable microservice, so as Picfair grows they can just add more servers to handle incoming images. It's seamlessly integrated into the existing tech stack, with testing, server monitoring, a clean API and interfaces for the Rails and Elasticsearch codebase.
I designed and built an interactive visualisation of London's property market, a system to find stories from rental trends each month, and data-driven marketing pages for Rentify's marketing team.
All of these tools are powered by a central data product. Combining Rentify's exclusive lettings data with 3rd party sources including London Datastore and Land Registry records, it creates a rolling, proprietary analysis of the London property and rental market.
The Promise was a powerful historical drama set in Israel and Palestine. Channel 4 expected strong and wide ranging reactions to the program, and wanted a way to show this conversation while maintaining editorial guidelines and balance.
I designed a topic modeling system to group the conversation into themes in a way that reflected Channel 4's editorial policy, built an interactive visualization to let viewers see and explore the conversation, and designed a server architecture to make it work at scale.
Behind the scenes, a data service collects tweets matching the hashtag #c4thepromise. Natural language processing techniques clean the text content, then store the words and sentences in a graph data structure. The topic modeler processes the graph and exports an optimised data format to a separate web app, which serves the visualization and client data requests at scale.
I use Python for data science, making extensive use of the Jupyter Notebook with numpy, scipy, pandas, scikit-learn and specialised libraries for exploratory work, model building and simulations.
I've used Postgres and MySQL extensively, also NoSQL databases including MongoDB, Elasticsearch, Redis and Neo4j. I've worked with data from APIs, web scrapers, camera phones, sensors, countless csv files... I once wrote a parser and named entity recognition tool to import data from a team's emails. I'm also happy using spreadsheets to share work with non data-scientists!
For visualisations I like d3.js and plot.ly on the web, and use matplotlib and DOT for more technical charts and network diagrams.
In data science projects, Flask is great for putting web facing API wrappers around code, and I sometimes build quick apps for dashboards, data labelling or operational support with Rails.
Read Courtney Boyd Myers' interview with me about marketing, artificial intelligence, creative technology and just about everything else.
When I'm not working I like to make things on the internet.