Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django
Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django



my presentation in Pycon APAC 2012

my presentation in Pycon APAC 2012



    Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django Presentation Transcript

    • Website Monitoring with DistributedMessages/Tasks Processing (AMQP & RabbitMQ) on Django
    • About me?● Rahmat Ramadhan Irianto● Software Developer at Void-Labs & Defpy-Labs● is a Open Source Software Developer Team● A Student from Indonesian University STMIK Dipanegara 2010 Makassar● Lives in Indonesian, Makassar● Write Python Apps every day
    • What is Website-Monitoring ?● Website monitoring provides page change monitoring and notification services to internet users worldwide. Website monitoring will create a change log for the page and alert user by email when it detects a change in the page text.
    • What Useful For ?● Website monitoring can monitor almost any page on the internet and when it detect page changes then it will alert you by email.● Website Monitoring can be your good choice for business intelligence strategy. Track your competition and get timely alerts when a they changes their website. or You can Watch for developments at your customers websites.● Monitor the press release page of companies you are invested in. Keep track of their current executives. Be alerted to changes on their home page.● Monitoring page privacy policies or terms and conditions without notice companies on the web , Now you can use website monitoring for alert you to these changes.● Monitor the new job listings pages at companies where you would like to work. When they post a new listing, we will email you.● Keep your up to date news. Monitor news page of your top site news. When they update it, youll get an email alert. Inspirate from changedetection● And much more
    • What Power build Website- monitoring?
    • Python ! Powerfull,Efficient,flexibility,ideal language,Effective for OOP,Elegant syntax,Rich of library & etc )
    • Django !( Django is a high-level Python Web framework thatencourages rapid development and clean, pragmatic design & Etc)
    • Mongodb ( flexibility, powerfull, Fast, and ease of use )
    • RabbitMQ ( Powerfull,fast, reliable & high availability for message queuing system. open source queueing option & Greats for building and managing scalable applications)
    • Workflow Website-Monitoring
    • Ajax Post Post Api request If Post Api Rest Api Save dataIf ajax post Procces task Scrape page Message queue Create worker worker Myview Publish task Save result If changepageSave data Alert Email Report Diff Mongodb
    • Lets Talk About
    • Why Mongodb ?● Greats features of document databases,key- value stores, and relational databases.● How greats ?● Fast● Smart● Scalable● Schema-less● Dynamic Query● Easy use & etc..
    • What we gonna Need ? + = Pymongo
    • How to ?import pymongofrom pymongo import Connectioncollection_user = pymongo.Connection().website_monitor.usercollection_monitor = pymongo.Connection().website_monitor.monitorcollection_task = pymongo.Connection().website_monitor.taskINSERTmonitor = {username:smart_str(request.user),, url:url, datetime:datetime.utcnow(), status:status, hit:0, fail_hit:0, period:int(request.POST.get(period)), email:collection_user.find_one({name:str(request.user)})[email], pk:pk, last_checking:None, task_id:task_id, }collection_monitor.insert(monitor)
    • UPDATEcollection_user.update({name:data_user[id]},{$set:{email:data_user[email], firstname:smart_str(data_user[first_name]), lastname:smart_str(data_user[last_name]), ip: request.META.get(REMOTE_ADDR,unknown),, user_agent:request.META.get(HTTP_USER_AGENT,unknown), session:request.META.get(XDG_SESSION_COOKIE,unknown), session_fb:session_key,, authkey:authkey, } } ) REMOVE if collection_content.find({url:i[url]}).count() == 3: collection_content.remove({url:i[url][0]})
    • Why we must use Distributed Computing Distributed ComputingIs a method of solving computationalproblem by dividing the problem into many tasks run simultaneously onmany hardware or software systems (Wikipedia)
    • What is Message queue ?Message Queues are: 0->Communication Buffers 0->Between independent sender & receiver processes 0->Asynchronous • Time of sending not necessarily same as receiving • In context of Web Applications: o Sender: Web Application Servers o Receiver: Background worker processes o Queue items: Tasks that the web server doesn’t have time/resources to do
    • How it work ?Say a web application server has a task itdoesn’t have time to do• It puts the task in the message queue• Other web servers can access the samequeue(s)and put tasks there• Workers are greedy and they all watch thequeues for tasks• Workers asynchronously pick up the firstavailable task on the queue when they are ready
    • What usefull for ?• Message Queues are useful in certainsituations• General guidelines: 0->Does your web applications take more thana few seconds to generate a response? o->Are you using a lot of cron jobs to processdata in the background? o->Do you wish you could distribute theprocessing of the data generated by yourapplication amongmany servers?
    • What We Need To Make Message Queue ?
    • AMQP & RabbitMQ
    • Why Choice AMQP & RabbitMQ ?1.RabbitMQ is free to use2.The documentation is decent3.There is decent clustering support, even though wenever needed clustering4.We didn’t want to lose queues or messages uponbroker crash/ restart5. We develop applications using Python/django andsetting up an AMQP backend using carrot waseasy
    • Now Lets Talk about RabbitMQ
    • RabbitMQ ? RabbitMQ is Erlang-based open sourceapplication that serves as a message broker ormessage-oriented middleware. RabbitMQ implementation refers to theapplication layer protocol that is the AdvancedMessage Queuing Protocol(AMQP). AMQP provide an interoperable standardprotocol between the vendor to regulate theexchange of messages on enterprise-scalesystems.
    • Why Use RabbitMQ ?● We need For...● Running Task / Procces in the backround● Asynchronous tasking process● Scheduling system & Etc
    • So .. What make Rabbit Focus ?
    • Carrot ! Carrot is an AMQP messaging queue framework. AMQP is the Advanced Message Queuing Protocol, an open standard protocol for message orientation, queuing, routing, reliability and security. Easy way to connect to RabbitMQ. Easy way to pull stuff out of the queue. Easy way to throw stuff into the queue.
    • Concept ?● Publishers (Publishers sends messages to an exchange.)● Exchanges (Messages are sent to exchanges. Exchanges are named and can be configured to use one of several routing algorithms. The exchange routes the messages to consumers by matching the routing key in the message with the routing key the consumer provides when binding to the exchange.)● Consumers (Consumers declares a queue, binds it to a exchange and receives messages from it.)● Queues ( Queues receive messages sent to exchanges. The queues are declared by consumers. )● Routing keys ( Every message has a routing key. The interpretation of the routing key depends on the exchange type. There are four default exchange types defined by the AMQP standard, and vendors can define custom types (so see your vendors manual for details )● Exchange types defined by AMQP/0.8:● Direct exchange ( Matches if the routing key property of the message and the routing_key attribute of the consumer are identical. )● Fan-out exchange(Always matches, even if the binding does not have a routing key.)● Topic exchange (Matches the routing key property of the message by a primitive pattern matching scheme.)
    • Creating Connetion on DjangoSettings.pyRABBITMQ_HOST = localhostRABBITMQ_PORT = 5672RABBITMQ_USER = guestRABBITMQ_PASS = guestRABBITMQ_VHOST = /Views.pyfrom carrot.messaging import Publisher, Consumerfrom carrot.connection import AMQPConnectionfrom django.conf import settingsconn_for_carrot =AMQPConnection(hostname=settings.RABBITMQ_HOST, port=settings.RABBITMQ_PORT, userid=settings.RABBITMQ_USER, password=settings.RABBITMQ_PASS, vhost=settings.RABBITMQ_VHOST)
    • Publisher publisher = Publisher(connection=conn_for_carrot,exchange=website_monitoring_exchange, exchange_type = direct) publisher.send({msg:{do: check, task_id:task_id, } }) publisher = Publisher(connection=conn_for_carrot,exchange=website_monitoring_exchange, exchange_type = direct) publisher.send({msg:{do: check, task_id:hashlib.md5(str(task_id)+request.PUT.get(url)).hexdigest(), } })
    • Consumerdef monitoring_check(): def call(message_data,message): if message_data[msg][do] == check: print [+] receiving message message.ack() task_id = message_data[msg][task_id] get_pid = subprocess.Popen([python,, task_id]) pid = collection_task.update({task_id:task_id}, {$set: {status:RUNNING,pid:pid}}) print [Starting PID:%s]%pid get_pid.wait() else: message.ack() queuename = website_monitoring_checker consumer = Consumer(connection=conn_for_carrot, queue=queuename,exchange=website_monitoring_exchange, exchange_type = direct) consumer.register_callback(call) try: print [queue:%s]consume.. % queuename consumer.wait() except Exception, err: print err
    • Cooking soup with beautifullsoup?from BeautifulSoup import BeautifulSoupmonitor = collection_monitor.find_one({pk:pk})contents = [collection_content.find({url:str(monitor[url])})[1],collection_content.find({url:str(monitor[url])})[0]] texts = BeautifulSoup(BeautifulSoup(i[content]).prettify()).findAll(text=True) data = {content: .join(filter(visible, texts)), datetime: i[datetime], }def visible(element): if in [style, script, [document], head, title]: return False if<!--, str(element)) or>, str(element));, str(element)): return False return True
    • Alert by email !def sending_email(to,sub,msg): try: gmail_user = gmail_pwd = *************** smtpserver = smtplib.SMTP("",587) smtpserver.ehlo() smtpserver.starttls() smtpserver.ehlo smtpserver.login(gmail_user, gmail_pwd) header = To: + to + n + From: Website-Monitoring <+gmail_user+>n +Subject: %sn%sub msg = header + msg smtpserver.sendmail(gmail_user,to, msg) smtpserver.close() except Exception ,err : print err
    • Task / Scheduling Checking ?task_id = sys.argv[1]print task_idraw_delay = collection_task.find_one({task_id:task_id})[schedule]print raw_delayif raw_delay == "1": delay = 60*60elif raw_delay =="12": delay = 720*60else: delay = 1440*60while True: try: print [+] Starting task: %s %sys.argv[1] log(task_id, INFO, starting session) main() except Exception, err: log(task_id, exception, err) print err collection_task.update({task_id:task_id}, {$set: {status:STOPPED, pid:None}}) log(task_id, INFO, updating database [status:STOPPED]) else: collection_task.update({task_id:task_id}, {$set: {status:SLEEP, pid:None}}) log(task_id, INFO, updating database [status:SLEEP] for %s sec %delay) time.sleep(delay)
    • Django-Piston ( A mini-framework for Django but powerfull for creating RESTful APIs )● Ties into Djangos internal mechanisms.● Supports OAuth out of the box (as well as Basic/Digest or custom auth.)● Doesnt require tying to models, allowing arbitrary resources.● Speaks JSON, YAML, Python Pickle & XML (and HATEOAS.)● Ships with a convenient reusable library in Python● Respects and encourages proper use of HTTP (status codes, ...)● Has built in (optional) form validation (via Django), throttling, etc.● Supports streaming, with a small memory footprint.● Stays out of your way.
    • How to ?Include on urls.pyurl(r^api/, include(api.urls)),Include on settings.pyINSTALLED_APPS = ( …....... api,Create folder name /api/ on projectdirectory and file.-API/
    • Rest APIS urls.pyfrom django.conf.urls.defaults import *from piston.resource import Resourcefrom piston.authentication import HttpBasicAuthenticationfrom api.handlers import *auth = HttpBasicAuthentication(realm="website-monitoring")ad = { authentication: auth }main = Resource(handler=Main, **ad)monitor = Resource(handler=Monitor, **ad)urlpatterns = patterns(, url(r^(?P<obj_id>[^/]+)/$, main), url(r^monitor/(?P<obj_id>[^/]+)/$, monitor),)
    • Rest APIS handlers.pyfrom piston.handler import BaseHandlerclass Main(BaseHandler): allowed_methods = (GET) def read(self, request, obj_id): data = collection_user.find_one({pk: obj_id}) if data: return data data = collection_monitor.find_one({pk: obj_id}) if data: return data
    • class Monitor(BaseHandler): allowed_methods = (GET, PUT, DELETE) fields = (url, status, hit, fail_hit, year, month, day, hour, email, period, diff) def read(self, request, obj_id): try: if obj_id == all: data = list(collection_monitor.find({username: str(request.user)})) elif obj_id =="status_running": data = list(collection_monitor.find({status:running})) …......... except Exception, err: return rc.BAD_REQUEST return data def update(self, request, obj_id): try: if obj_id == create: url_list = [] for i in collection_monitor.find({username: str(request.user)}): url_list.append(i[url]) if request.PUT.get(url) in url_list: print [+] Url is exist print [+] Data will be Update else: raise Exception except Exception, err: print err return rc.BAD_REQUEST …......................
    • def delete(self, request, obj_id): try: if obj_id == all: for i in collection_monitor.find({username: str(request.user)}): collection_monitor.remove({username: str(request.user)}) else: if collection_monitor.find_one({pk: obj_id}): collection_monitor.remove({pk: obj_id}) except Exception, err: print err return rc.FORBIDDEN else: print deleted return rc.DELETED
    • Facebook Integration ?● Just for lazy people● You dont have to fill the register form just login in to your facebook then klick – klick & klick .● Good for bussiness marketing● Easy integrate & Etc● Download :● git clone
    • Question ?● Twitter :@jimmyromanticde● Facebook: mantic.devil● Email :● Bitbucket:● Blog :
    • References Life in a Queue “Tareque Hossain” Google “Message Queue”
    • Thank You ! :)