Website Monitoring with DistributedMessages/Tasks Processing (AMQP &       RabbitMQ) on Django
About me?●   Rahmat Ramadhan Irianto●   Software Developer at Void-Labs & Defpy-Labs●   is a Open Source Software Develope...
What is Website-Monitoring ?●   Website monitoring provides page change monitoring    and notification services to interne...
What Useful For ?●   Website monitoring can monitor almost any page on the internet and when it    detect page changes the...
What Power build Website-       monitoring?http://goo.gl/hCf34
Python !                                     http://goo.gl/sSqHh( Powerfull,Efficient,flexibility,ideal language,Effective...
http://goo.gl/YXnA9            Django !( Django is a high-level Python Web framework thatencourages rapid development and ...
Mongodb  ( flexibility, powerfull, Fast,        and ease of use )http://www.mongodb.org                                   ...
RabbitMQ  ( Powerfull,fast, reliable & high availability for message queuing system. open source  queueing option & Greats...
Workflow Website-Monitoring
Ajax Post             Post Api               request         If Post Api      Rest Api                     Save dataIf aja...
Lets Talk About          http://goo.gl/m8QUH
Why Mongodb ?●   Greats features of document databases,key-    value stores, and relational databases.●   How greats ?●   ...
What we gonna Need ?              +               = Pymongohttp://pypi.python.org/pypi/pymongo/
How to ?import pymongofrom pymongo import Connectioncollection_user = pymongo.Connection().website_monitor.usercollection_...
UPDATEcollection_user.update({name:data_user[id]},{$set:{email:data_user[email],                      firstname:smart_str(...
Why we must use Distributed       Computing       Distributed ComputingIs a method of solving computationalproblem by divi...
What is Message queue ?Message Queues are: 0->Communication Buffers 0->Between independent sender & receiver processes 0->...
How it work ?Say a web application server has a task itdoesn’t have time to do• It puts the task in the message queue• Oth...
What usefull for ?• Message Queues are useful in certainsituations• General guidelines:  0->Does your web applications tak...
What We Need To Make Message          Queue ?
AMQP & RabbitMQ
Why Choice AMQP & RabbitMQ ?1.RabbitMQ is free to use2.The documentation is decent3.There is decent clustering support, ev...
Now Lets Talk about RabbitMQ
RabbitMQ ? RabbitMQ is Erlang-based open sourceapplication that serves as a message broker ormessage-oriented middleware. ...
Why Use RabbitMQ ?● We need For...●  Running Task / Procces in the  backround●  Asynchronous tasking process●  Scheduling ...
So .. What make Rabbit Focus ?
Carrot !           Carrot is an AMQP messaging           queue framework. AMQP is the           Advanced Message Queuing  ...
Concept ?●   Publishers (Publishers sends messages to an exchange.)●   Exchanges (Messages are sent to exchanges. Exchange...
Creating Connetion on DjangoSettings.pyRABBITMQ_HOST = localhostRABBITMQ_PORT = 5672RABBITMQ_USER = guestRABBITMQ_PASS = g...
Publisher      publisher = Publisher(connection=conn_for_carrot,exchange=website_monitoring_exchange, exchange_type = dire...
Consumerdef monitoring_check():   def call(message_data,message):      if message_data[msg][do] == check:         print [+...
Cooking soup with beautifullsoup?from BeautifulSoup import BeautifulSoupmonitor = collection_monitor.find_one({pk:pk})cont...
Alert by email !def sending_email(to,sub,msg):  try:     gmail_user = romanticdevil.jimmy@gmail.com     gmail_pwd = ******...
Task / Scheduling Checking ?task_id = sys.argv[1]print task_idraw_delay = collection_task.find_one({task_id:task_id})[sche...
Django-Piston    ( A mini-framework for Django but powerfull for creating RESTful APIs )               https://bitbucket.o...
How to ?Include on urls.pyurl(r^api/, include(api.urls)),Include on settings.pyINSTALLED_APPS = (  ….......  api,Create fo...
Rest APIS urls.pyfrom django.conf.urls.defaults import *from piston.resource import Resourcefrom piston.authentication imp...
Rest APIS handlers.pyfrom piston.handler import BaseHandlerclass Main(BaseHandler):   allowed_methods = (GET)   def read(s...
class Monitor(BaseHandler):   allowed_methods = (GET, PUT, DELETE)   fields = (url, status, hit, fail_hit, year, month, da...
def delete(self, request, obj_id):     try:        if obj_id == all:           for i in collection_monitor.find({username:...
Facebook Integration ?●   Just for lazy people●   You dont have to fill the register form just login    in to your faceboo...
Question ?●   Twitter :@jimmyromanticde●   Facebook:https://www.facebook.com/jimmy.ro    mantic.devil●   Email : romanticd...
References               http://www.python.org          https://www.djangoproject.com              http://www.mongodb.org ...
Thank You ! :)
Upcoming SlideShare
Loading in …5
×

Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

3,386 views

Published on

my presentation in Pycon APAC 2012

Published in: Technology, Design
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,386
On SlideShare
0
From Embeds
0
Number of Embeds
225
Actions
Shares
0
Downloads
65
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

  1. 1. Website Monitoring with DistributedMessages/Tasks Processing (AMQP & RabbitMQ) on Django
  2. 2. About me?● Rahmat Ramadhan Irianto● Software Developer at Void-Labs & Defpy-Labs● is a Open Source Software Developer Team● A Student from Indonesian University STMIK Dipanegara 2010 Makassar● Lives in Indonesian, Makassar● Write Python Apps every day
  3. 3. What is Website-Monitoring ?● Website monitoring provides page change monitoring and notification services to internet users worldwide. Website monitoring will create a change log for the page and alert user by email when it detects a change in the page text.
  4. 4. What Useful For ?● Website monitoring can monitor almost any page on the internet and when it detect page changes then it will alert you by email.● Website Monitoring can be your good choice for business intelligence strategy. Track your competition and get timely alerts when a they changes their website. or You can Watch for developments at your customers websites.● Monitor the press release page of companies you are invested in. Keep track of their current executives. Be alerted to changes on their home page.● Monitoring page privacy policies or terms and conditions without notice companies on the web , Now you can use website monitoring for alert you to these changes.● Monitor the new job listings pages at companies where you would like to work. When they post a new listing, we will email you.● Keep your up to date news. Monitor news page of your top site news. When they update it, youll get an email alert. Inspirate from changedetection● And much more http://www.changedetection.com
  5. 5. What Power build Website- monitoring?http://goo.gl/hCf34
  6. 6. Python ! http://goo.gl/sSqHh( Powerfull,Efficient,flexibility,ideal language,Effective for OOP,Elegant syntax,Rich of library & etc ) www.python.org
  7. 7. http://goo.gl/YXnA9 Django !( Django is a high-level Python Web framework thatencourages rapid development and clean, pragmatic design & Etc) https://www.djangoproject.com/
  8. 8. Mongodb ( flexibility, powerfull, Fast, and ease of use )http://www.mongodb.org http://goo.gl/NZQ18
  9. 9. RabbitMQ ( Powerfull,fast, reliable & high availability for message queuing system. open source queueing option & Greats for building and managing scalable applications)http://www.rabbitmq.com http://goo.gl/Pvd9Q
  10. 10. Workflow Website-Monitoring
  11. 11. Ajax Post Post Api request If Post Api Rest Api Save dataIf ajax post Procces task Scrape page Message queue Create worker worker Myview Publish task Save result If changepageSave data Alert Email Report Diff Mongodb
  12. 12. Lets Talk About http://goo.gl/m8QUH
  13. 13. Why Mongodb ?● Greats features of document databases,key- value stores, and relational databases.● How greats ?● Fast● Smart● Scalable● Schema-less● Dynamic Query● Easy use & etc..
  14. 14. What we gonna Need ? + = Pymongohttp://pypi.python.org/pypi/pymongo/
  15. 15. How to ?import pymongofrom pymongo import Connectioncollection_user = pymongo.Connection().website_monitor.usercollection_monitor = pymongo.Connection().website_monitor.monitorcollection_task = pymongo.Connection().website_monitor.taskINSERTmonitor = {username:smart_str(request.user), user_id:request.user.id, url:url, datetime:datetime.utcnow(), status:status, hit:0, fail_hit:0, period:int(request.POST.get(period)), email:collection_user.find_one({name:str(request.user)})[email], pk:pk, last_checking:None, task_id:task_id, }collection_monitor.insert(monitor)
  16. 16. UPDATEcollection_user.update({name:data_user[id]},{$set:{email:data_user[email], firstname:smart_str(data_user[first_name]), lastname:smart_str(data_user[last_name]), ip: request.META.get(REMOTE_ADDR,unknown), login:datetime.now(), user_agent:request.META.get(HTTP_USER_AGENT,unknown), session:request.META.get(XDG_SESSION_COOKIE,unknown), session_fb:session_key, ts:datetime.now(), authkey:authkey, } } ) REMOVE if collection_content.find({url:i[url]}).count() == 3: collection_content.remove({url:i[url][0]})
  17. 17. Why we must use Distributed Computing Distributed ComputingIs a method of solving computationalproblem by dividing the problem into many tasks run simultaneously onmany hardware or software systems (Wikipedia)
  18. 18. What is Message queue ?Message Queues are: 0->Communication Buffers 0->Between independent sender & receiver processes 0->Asynchronous • Time of sending not necessarily same as receiving • In context of Web Applications: o Sender: Web Application Servers o Receiver: Background worker processes o Queue items: Tasks that the web server doesn’t have time/resources to do
  19. 19. How it work ?Say a web application server has a task itdoesn’t have time to do• It puts the task in the message queue• Other web servers can access the samequeue(s)and put tasks there• Workers are greedy and they all watch thequeues for tasks• Workers asynchronously pick up the firstavailable task on the queue when they are ready
  20. 20. What usefull for ?• Message Queues are useful in certainsituations• General guidelines: 0->Does your web applications take more thana few seconds to generate a response? o->Are you using a lot of cron jobs to processdata in the background? o->Do you wish you could distribute theprocessing of the data generated by yourapplication amongmany servers?
  21. 21. What We Need To Make Message Queue ?
  22. 22. AMQP & RabbitMQ
  23. 23. Why Choice AMQP & RabbitMQ ?1.RabbitMQ is free to use2.The documentation is decent3.There is decent clustering support, even though wenever needed clustering4.We didn’t want to lose queues or messages uponbroker crash/ restart5. We develop applications using Python/django andsetting up an AMQP backend using carrot waseasy
  24. 24. Now Lets Talk about RabbitMQ
  25. 25. RabbitMQ ? RabbitMQ is Erlang-based open sourceapplication that serves as a message broker ormessage-oriented middleware. RabbitMQ implementation refers to theapplication layer protocol that is the AdvancedMessage Queuing Protocol(AMQP). AMQP provide an interoperable standardprotocol between the vendor to regulate theexchange of messages on enterprise-scalesystems.
  26. 26. Why Use RabbitMQ ?● We need For...● Running Task / Procces in the backround● Asynchronous tasking process● Scheduling system & Etc
  27. 27. So .. What make Rabbit Focus ?
  28. 28. Carrot ! Carrot is an AMQP messaging queue framework. AMQP is the Advanced Message Queuing Protocol, an open standard protocol for message orientation, queuing, routing, reliability and security. Easy way to connect to RabbitMQ. Easy way to pull stuff out of the queue. Easy way to throw stuff into the queue. https://github.com/ask/carrot/
  29. 29. Concept ?● Publishers (Publishers sends messages to an exchange.)● Exchanges (Messages are sent to exchanges. Exchanges are named and can be configured to use one of several routing algorithms. The exchange routes the messages to consumers by matching the routing key in the message with the routing key the consumer provides when binding to the exchange.)● Consumers (Consumers declares a queue, binds it to a exchange and receives messages from it.)● Queues ( Queues receive messages sent to exchanges. The queues are declared by consumers. )● Routing keys ( Every message has a routing key. The interpretation of the routing key depends on the exchange type. There are four default exchange types defined by the AMQP standard, and vendors can define custom types (so see your vendors manual for details )● Exchange types defined by AMQP/0.8:● Direct exchange ( Matches if the routing key property of the message and the routing_key attribute of the consumer are identical. )● Fan-out exchange(Always matches, even if the binding does not have a routing key.)● Topic exchange (Matches the routing key property of the message by a primitive pattern matching scheme.)
  30. 30. Creating Connetion on DjangoSettings.pyRABBITMQ_HOST = localhostRABBITMQ_PORT = 5672RABBITMQ_USER = guestRABBITMQ_PASS = guestRABBITMQ_VHOST = /Views.pyfrom carrot.messaging import Publisher, Consumerfrom carrot.connection import AMQPConnectionfrom django.conf import settingsconn_for_carrot =AMQPConnection(hostname=settings.RABBITMQ_HOST, port=settings.RABBITMQ_PORT, userid=settings.RABBITMQ_USER, password=settings.RABBITMQ_PASS, vhost=settings.RABBITMQ_VHOST)
  31. 31. Publisher publisher = Publisher(connection=conn_for_carrot,exchange=website_monitoring_exchange, exchange_type = direct) publisher.send({msg:{do: check, task_id:task_id, } }) publisher = Publisher(connection=conn_for_carrot,exchange=website_monitoring_exchange, exchange_type = direct) publisher.send({msg:{do: check, task_id:hashlib.md5(str(task_id)+request.PUT.get(url)).hexdigest(), } })
  32. 32. Consumerdef monitoring_check(): def call(message_data,message): if message_data[msg][do] == check: print [+] receiving message message.ack() task_id = message_data[msg][task_id] get_pid = subprocess.Popen([python,scraper.py, task_id]) pid = get_pid.pid collection_task.update({task_id:task_id}, {$set: {status:RUNNING,pid:pid}}) print [Starting PID:%s]%pid get_pid.wait() else: message.ack() queuename = website_monitoring_checker consumer = Consumer(connection=conn_for_carrot, queue=queuename,exchange=website_monitoring_exchange, exchange_type = direct) consumer.register_callback(call) try: print [queue:%s]consume.. % queuename consumer.wait() except Exception, err: print err
  33. 33. Cooking soup with beautifullsoup?from BeautifulSoup import BeautifulSoupmonitor = collection_monitor.find_one({pk:pk})contents = [collection_content.find({url:str(monitor[url])})[1],collection_content.find({url:str(monitor[url])})[0]] texts = BeautifulSoup(BeautifulSoup(i[content]).prettify()).findAll(text=True) data = {content: .join(filter(visible, texts)), datetime: i[datetime], }def visible(element): if element.parent.name in [style, script, [document], head, title]: return False if re.search(<!--, str(element)) or re.search(-->, str(element)) orre.search(&nbsp;, str(element)): return False return True
  34. 34. Alert by email !def sending_email(to,sub,msg): try: gmail_user = romanticdevil.jimmy@gmail.com gmail_pwd = *************** smtpserver = smtplib.SMTP("smtp.gmail.com",587) smtpserver.ehlo() smtpserver.starttls() smtpserver.ehlo smtpserver.login(gmail_user, gmail_pwd) header = To: + to + n + From: Website-Monitoring <+gmail_user+>n +Subject: %sn%sub msg = header + msg smtpserver.sendmail(gmail_user,to, msg) smtpserver.close() except Exception ,err : print err
  35. 35. Task / Scheduling Checking ?task_id = sys.argv[1]print task_idraw_delay = collection_task.find_one({task_id:task_id})[schedule]print raw_delayif raw_delay == "1": delay = 60*60elif raw_delay =="12": delay = 720*60else: delay = 1440*60while True: try: print [+] Starting task: %s %sys.argv[1] log(task_id, INFO, starting session) main() except Exception, err: log(task_id, exception, err) print err collection_task.update({task_id:task_id}, {$set: {status:STOPPED, pid:None}}) log(task_id, INFO, updating database [status:STOPPED]) else: collection_task.update({task_id:task_id}, {$set: {status:SLEEP, pid:None}}) log(task_id, INFO, updating database [status:SLEEP] for %s sec %delay) time.sleep(delay)
  36. 36. Django-Piston ( A mini-framework for Django but powerfull for creating RESTful APIs ) https://bitbucket.org/jespern/django-piston/wiki/Home● Ties into Djangos internal mechanisms.● Supports OAuth out of the box (as well as Basic/Digest or custom auth.)● Doesnt require tying to models, allowing arbitrary resources.● Speaks JSON, YAML, Python Pickle & XML (and HATEOAS.)● Ships with a convenient reusable library in Python● Respects and encourages proper use of HTTP (status codes, ...)● Has built in (optional) form validation (via Django), throttling, etc.● Supports streaming, with a small memory footprint.● Stays out of your way.
  37. 37. How to ?Include on urls.pyurl(r^api/, include(api.urls)),Include on settings.pyINSTALLED_APPS = ( …....... api,Create folder name /api/ on projectdirectory and file.-API/-----handlers.py-----__init__.py-----urls.py
  38. 38. Rest APIS urls.pyfrom django.conf.urls.defaults import *from piston.resource import Resourcefrom piston.authentication import HttpBasicAuthenticationfrom api.handlers import *auth = HttpBasicAuthentication(realm="website-monitoring")ad = { authentication: auth }main = Resource(handler=Main, **ad)monitor = Resource(handler=Monitor, **ad)urlpatterns = patterns(, url(r^(?P<obj_id>[^/]+)/$, main), url(r^monitor/(?P<obj_id>[^/]+)/$, monitor),)
  39. 39. Rest APIS handlers.pyfrom piston.handler import BaseHandlerclass Main(BaseHandler): allowed_methods = (GET) def read(self, request, obj_id): data = collection_user.find_one({pk: obj_id}) if data: return data data = collection_monitor.find_one({pk: obj_id}) if data: return data
  40. 40. class Monitor(BaseHandler): allowed_methods = (GET, PUT, DELETE) fields = (url, status, hit, fail_hit, year, month, day, hour, email, period, diff) def read(self, request, obj_id): try: if obj_id == all: data = list(collection_monitor.find({username: str(request.user)})) elif obj_id =="status_running": data = list(collection_monitor.find({status:running})) …......... except Exception, err: return rc.BAD_REQUEST return data def update(self, request, obj_id): try: if obj_id == create: url_list = [] for i in collection_monitor.find({username: str(request.user)}): url_list.append(i[url]) if request.PUT.get(url) in url_list: print [+] Url is exist print [+] Data will be Update else: raise Exception except Exception, err: print err return rc.BAD_REQUEST …......................
  41. 41. def delete(self, request, obj_id): try: if obj_id == all: for i in collection_monitor.find({username: str(request.user)}): collection_monitor.remove({username: str(request.user)}) else: if collection_monitor.find_one({pk: obj_id}): collection_monitor.remove({pk: obj_id}) except Exception, err: print err return rc.FORBIDDEN else: print deleted return rc.DELETED
  42. 42. Facebook Integration ?● Just for lazy people● You dont have to fill the register form just login in to your facebook then klick – klick & klick .● Good for bussiness marketing● Easy integrate & Etc● Download :● git clone http://github.com/dickeytk/django_facebook_oauth.git
  43. 43. Question ?● Twitter :@jimmyromanticde● Facebook:https://www.facebook.com/jimmy.ro mantic.devil● Email : romanticdevil.jimmy@gmail.com● Bitbucket: https://bitbucket.org/jimmyromanticdevil/● Blog : http://jimmyromanticdevil.wordpress.com
  44. 44. References http://www.python.org https://www.djangoproject.com http://www.mongodb.org http://www.rabbitmq.com http://pypi.python.org/pypi/pymongo https://github.com/ask/carrot/https://bitbucket.org/jespern/django-piston/wiki/Homehttp://github.com/dickeytk/django_facebook_oauth.git Life in a Queue “Tareque Hossain” Google “Message Queue”
  45. 45. Thank You ! :)

×