how stuff works - technology @ fisheye analytics -ashwinreddygayam Part 1 : How a day looks like http://engineering.fisheyeanalytics.com
3 layers of technology In the spirit of sharing knowledge, I will discuss how we at Fisheye Analytics engineer large scale software systems solving complex problems in a series of presentations. Fisheye runs it’s technology on 30 servers running programs 24x7 giving insightful media intelligence for it’s clients. The technology can be put into three layers 1. Crawling & Search Engines 2. Analytics Processing 3. Client Applications
How a day looks like In this presentation, I’d like to shed some light on how a day in our server farm looks like. In AWS US East Coast, A handful of proprietary web crawlers download tens of gigabytes of data a day scouring through millions of web pages, Twitter & Facebook APIs running on a cluster of machines Another bunch of indexers, index the data fetched above using SOLR and the data is ready to be searched for.
How a day looks like Message Queues (using ActiveMQ) get flooded with millions of messages and act as a backbone with which all machines on the cluster exchange information. Peeks of up to a thousand messages a second is not uncommon. Meanwhile in the Singapore farm, Database servers running MySQL (partitioned) see peeks of 600 transactions per second (read+write) Also in the Singapore farm, 7 different kinds of analytic programs (called DPPs internally), all highly multi threaded, feast on over 40 cores of CPU in a separate cluster.
How a day looks like Shell scripts make real time backups of articles stored on our NAS into the cloud (AWS EBS) encrypted with AES 128 bit encryption Copies of MySQL binary logs are continuously transferred over for incremental backup at two different places. Health monitoring programs run all day long measuring message queue sizes, server uptimes etc. and shoot emails and texts alerts to our mobile phones as they sense anything abnormal. Programs/servers stopping erroneously are automatically restarted too by the health monitoring programs.
How a day looks like A central log monitoring server (internally called FishMon) pulls the logs of various programs, servers at regular intervals, and stores them centrally allowing the developers to glance through them to catch and fix bugs in a rich interface. For reporting and analytic purposes for our clients, client data in specific formats is indexed through Sphinx search engine and is queried by Media Lens (one of our products) and our client report generators. Replicated Databases, Search Engine Indexes, Message Queues work as hot spares when their masters go down.
More to it! There is definitely more to it and this is just a starter. After all, we boast state-of-the art technology in different areas. We build such large complex systems with only a handful of us – sounds like a startup? If this kind of stuff gives you a kick, you should wait no longer and try to build your career with us. We are always looking for very passionate and extremely talented engineers who can help us build technology which makes a difference. More information about our software architectures coming in future presentations.
Thank you! Thank you everyone and feel free to write to me at email@example.com You can always drop by our Singapore office and say Hi to our great engineers.