Artificial Intelligence Algorithm CASE STUDY:
Clustering of news articles
PROBLEM: "Our customers don't want forty articles, from forty sources, on the same news topic. They want one good article on each of forty different topics. But our topic similarity-detection system uses a conventional clustering algorithm -- which is slow! Every article is compared with every other article, so it really slows down when we try to cluster more than 10,000 articles."
SOLUTION: I invented a new clustering technique which doesn't slow down with more articles. We now cluster 10,000 articles in seconds, rather than hours. Because our system is so fast, we now find the topic clusters for a collection of 10,000,000 news articles in just a couple hours. This helps keep customers satisfied with our news products. We were granted US Patent 9,753,964 for the MIMOSA clustering method.
Software Engineering CASE STUDY:
Overloaded web server farm
PROBLEM: "Our customer base has grown to millions of page views per day. We already use caching, CDNs, and multiple datacenters, but the load on our back-end servers keeps growing. We keep having to spend money on more servers."
SOLUTION: I introduced a template compiler for the existing HTML page templates, which had been interpreted in a high-level language. Without having to modify any templates or application code, the server load was reduced by a factor of 4x, and webpage throughput increased by 4x. Server spending was reduced.
Data Science CASE STUDY:
Predict stock-market impact of breaking news
PROBLEM: "Customers want to know which of the thousands of news articles we send them every day are most critical for them to read. Stock traders want to know which breaking news articles are most likely to affect the prices of stocks in their portfolio."
SOLUTION: My colleagues and I studied stock price movements that occurred at the time of important economic announcements and stock-related press releases. Using a database of every US stock trade, and correlating the price jumps against the news text content, we found high-quality signals predictive of jump magnitude. I designed and developed a price jump-detector algorithm which outperforms conventional change-point finders. Our results have been incorporated into automated news article analysis, so that customers receive an impact score on breaking stories within milliseconds.
DATA SCIENCE: With long experience in analyzing business and scientific data, I use data analytics tools to provide unexpected insights and common sense interpretation for marketing, sales, and product development teams. Recent examples:
SOFTWARE: Drawing from my scientific research on software algorithms, neural computation, pattern recognition, and data analysis, I engineer unique capabilities and robustness into the software systems I build. The results include:
- TALLYHO -- I built a search engine in C to perform complex queries for document content and metadata. The TALLYHO engine supports Acquire Media's sales, marketing, editorial, and analytic teams.
- IMPACT -- My colleagues and I developed a machine learning system for predictively correlating news events with stock price changes. The results let customers determine which breaking news articles are likely to cause price movements on a corporate stock.
- SCRUNCH -- I designed and implemented a reliable method for distinguishing between true and illusory jumps in time-series data. The SCRUNCH jump-detector is used to identify news-related stock price movements.
- ENSEMBLE -- My colleagues and I have developed methods to improve data classification by combining multiple classifiers. The methods allow Acquire Media to optimally combine proprietary and public-domain classifiers so that the resulting classifications are more accurate and precise than any individual classifier.
- METABOT -- My colleagues and I developed improvements to Acquire Media's natural language processing engine for text semantic analysis. The outcome is the top-performing Metabot engine for taxonomic classification and named-entity recognition.
SOFTWARE: Drawing from my scientific research on software algorithms, neural computation, pattern recognition, and data analysis, I engineer unique capabilities and robustness into the software systems I build. The results include:
- HTML compiler -- I introduced an HTML template compiler at the core of Hearst Digital Media's page-rendering engine, deployed in server farms in multiple datacenters. This reduced computational load by 4x, across Hearst Magazines' 48 websites, and saved Hearst costs of server deployment growth.
- MIMOSA -- I invented a dramatically faster method for identifying similarities between news stories. The result is a >50,000x speedup of news topic clustering, for Acquire Media's news search engine. We were granted US Patent 9,753,964 for the MIMOSA algorithms.
- ProxyMate -- My colleagues and I built a linearly-scalable platform for private web browsing, with high-performance C code, for Lucent Technologies' entrepreneurial New Ventures Group. I served as CTO for the successful spinoff to NaviPath.
- ThoughtWheel -- I built a full-stack implementation of a new social media platform, for finding and engaging communities of like-minded members. I developed the front-end UI and back-end database for creating custom user profile questionnaires.
- Deal Knowledge -- I worked with an overseas outsource development firm to design and build a custom knowledgebase on potential investments for a venture capital firm, Ericsson Venture Partners. I managed the outsource relationship and collaborated with internal VC team members to ensure satisfactory delivery.
- FLASH-BANG -- I created a custom text-pattern compiler at Acquire Media for ultrafast stock traders, The result is faster than Lex-generated C state-machine code for parsing text.
- KOTHIC -- I invented and built a suite of novel products that identify and deliver top news stories to customers, for Acquire Media. The underlying methods for automatically finding the top news stories are incorporated in several products.
- Pragma -- I architected and built an algorithmic crossing-network engine for optimally executing equity trades. I designed methods to bolster the gaming-resistance of the trade engine.
- LimeBits -- I led a team that built a website component-sharing product for Lime Labs. I redesigned the product and rearchitected the implementation based on investor vision and existing platform.