You are seeing the early beta version of Zanran. We will be delighted for any comment. or suggestions. Gushing and excessive praise is always welcome.
For ideas and suggestions, please email us: helpdesk [at] zanran [dot] com.
We will attempt to reply to all (sensible) emails within 2 working days.
What is Zanran?
Zanran helps you to find ‘semi-structured’ data on the web. This is the numerical data that people have presented as graphs and tables and charts. For example, the data could be a graph in a PDF report, or a table in an Excel spreadsheet, or a barchart shown as an image in an HTML page. This huge amount of information can be difficult to find using conventional search engines, which are focused primarily on finding text rather than graphs, tables and bar charts.
Put more simply: Zanran is Google for data.
How it works… technology overview
Zanran doesn't work by spotting wording in the text and looking for images – it's the other way round.
The system examines millions of images and decides for each one whether it's a graph, chart or table – whether it has numerical content.
The core technology is patented computer vision algorithms that decide whether an image is numerical – and they're accurate (about 98%). But the huge majority of images on the internet are not graphs etc. So even though the accuracy is high, you will still get some non-numerical images.
In comparison, looking for tables is relatively simple. Once we've found a table we then have to decide whether it's essentially numerical - and we have algorithms for that.
Our programmes then take suitable text near that image and build the search engine around that text. At present, we extract tables and images from HTML, PDF and Excel files and will be processing PowerPoint and Word documents in the near future.
It is worth also mentioning that mapping the numerical content on the web would not have been possible without the development of open-source software and the access to vast processing power and cheap storage in cloud computing.
Zanran has crawled most of the internet. But if you think there is a good site we've missed, please let us know.
Who we are
Jon Goldhill - Founder
I have a PhD in chemistry, an MBA from London Business School, and have worked for myself for many years. I really, really, enjoy tech startups, not just my own, but investing in other peoples’ as well. It’s what gets my blood going and my brain functioning.
At the other end of the brain-activity spectrum, I love hill-walking. Six hours in the rain on a Scottish mountain is calming and surprisingly enjoyable. More so in the sunshine, of course.
Yves Dassas – Founder
African at heart and fluent in Swahili, I was born in the Congo (ex Zaire) where I spent a happy childhood discovering the beauty of Africa and the fascinating culture of the Congolese people. I went on to study Engineering in Belgium and a PhD in Electrochemistry at Columbia University in New-York, an experience that opened my eyes to a culture built on dreams and a ‘can do’ attitude.
A free spirit by nature and unfit for a corporate environment, I moved to London (UK) to become an entrepreneur in technology. I met there the other founder, Jon Goldhill, with whom I’ve started a few businesses.
How it happened
After we sold our main telecom businesses in 2003-4, Yves took two years’ sabbatical to study math - Number Theory and Group Theory. Luckily he balanced his passion for algebra with the finding out about more practical subjects: image processing and machine learning. During this time, Jon caught up on Cosmology, and failed to get a new type of electronic alarm off the ground.
The two of us kept meeting regularly to discuss useless philosophical, religious and scientific questions while consuming great amounts of tea and coffee. During one of these caffeine-driven sessions, we imagined how, one day, a user would be able to enter the X and Y labels of a graph in a search engine and gets the resulting pie chart, bar chart, line graph or table. Zanran was born.
We understood then that building a numerical search engine relied on the development of computer vision algorithms capable of filtering images with numerical content from other images.
We reviewed the academic research literature to identify the computer vision algorithms used in the wider context of image categorisation or classification. To our surprise, we realised that even a simple task like finding lines in an image (using the classical Hough transform) was unreliable. We faced additional technical challenges for our practical application. The classification had to be very fast in order to cope with the billions of images on the internet; and it had to be very accurate because numerical images constituted a very small minority of all images. Faced with a computer vision field still in its infancy and unique practical challenges, we decided to develop the technology in-house. We spent 18 months doing this – using a typical approach of feature extraction and machine learning. Our UK patent covers this work.
We then assembled a team of outstanding software programmers to develop crawlers, write images and table extraction software for HTML, PDF and Excel files, build a search engine and deploy all these servers in a cloud environment.
Time has flown since on a journey that took us from an idea four years ago to Zanran’s beta version today. It’s been a very enriching experience. We hope that you will find our search tool useful.
Over the last 4 years, Zanran’s had the privilege to work with a small team of outstanding programmers including Adrian, Alex, Gary and Tomasz.
Most of the work in the last year and a half has been driven by Tomasz who demonstrates rare intellectual abilities: a practical approach to problem solving and a deep and wide knowledge of the open-source software world. His incredible productivity is no surprise for someone who qualifies sleep as a “waste of time”.
We are fortunate in having an experienced non-executive Chairman – John Yeomans. He brings different views from our own, asks the right questions, and knows everybody in the financial world.
Last but not least, help and entertainment around the office is provided by Freddie. He's a cheerful Old English Sheepdog who was acquired from Battersea Dogs Home in 2008.
Our thanks go to our funders: both the London Development Agency (who helped with our early R&D costs), and our more recent investors. May they all reap excellent returns!
Zanran Ltd. 26 Danbury Street, London, N1 8JU, UK. Regd in England no: 6,022,009