Open Source Technologies
What is Open Source ?
Simple: You can read the code.
           You can see how it's made
Two main characteristics 
     First, Its FREE
Second (much more important &  
interesting),it’s free as in freedom.
Four Freedoms
* The freedom to run the program for any

      Purpose


* The freedom to study how the program   
  works, and adapt it to your needs


* The freedom to redistribute copies


* The freedom to improve the program
Why this is cool ?
Anyone can do whatever they like with it.
Nobody owns it, Everyone can use it, Anyone 
can improve it
Improved in terms of quantity of code 
(functionality)
People add layers on top of other people’s code
As the code base grows, the potential grows
Improves chances of it being used for something 
not intended by the originator
What does it take to be a 
      Web Developer?
HTML
 &
PHP
Let's take a brief look on what is a 
       “Web Developer”
And that was just the Ruby stack
Now back to the question
What does it take to be a Web Developer?
A Passion for Learning
LAMP
L

    Linux
 * Very reliable OS

 * Extremely powerful

 * Performs great even in less 
   resources

 * Compelling Graphics

 * Powerful Programming supports

 * Scalable

 * No piracy Issues
L

    Apache
Web server can refer to either the hardware (the 
computer)  or  the  software  (the  computer 
application)  that  helps  to  deliver  Web  content 
that can be accessed through the Internet.

The  most  common  use  of  web  servers  is  to  host 
websites,  but  there  are  other  uses  such  as 
gaming,  data  storage  or  running  enterprise 
applications.

Apache
 * Only web­server to run on all major platforms 
   (*NIX, WINDOZ, MAC, FREEBSD and any other you 
   name it)

 * Largest Market share holder for web servers 
   since 1996 and still growing.
L

    MySQL
 * Relational Database 

 * World’s Fastest growing open 
   source database servers.

 * Fast performance, high reliability 
   and ease of use. 

 * It's used on every continent ­­ 
   Yes, even Antarctica 

 * Work on more than 20 platforms 
   including Linux, Windoz, OS/X, HP­
   UX, AIX, Netware to name a few

 * Supports various Engines
L

    PHP
 * Open Source server­side scripting  
   language designed specifically for the 
   web. 

 * Most widely uses language on the web

 * Outputs not only HTML but can output XML,
   images (JPG & PNG), PDF files and even 
   Flash movies (using libswf and Ming) all 
   generated on the fly. Can write these 
   files to the filesystem.

 * Supports a wide­range of databases 
   (20 + ODBC).

 * Perl­ and C­like syntax. Relatively easy 
   to learn.
L

    LAMP Overview
Let's CODE :)
Memcache
What is Caching ?
A Copy of real data with faster (and/or 
cheaper) access.




From  Wikipedia  :  "A  cache  is  a 
collection  of  data  duplicating  original 
stored  elsewhere  or  computed  earlier, 
where the original data is expensive to 
fetch(owing  to  longer  access  time)  or 
to  compute,  compared  to  the  cost  of 
reading the cache."
MySQL query Cache   : Cache in the DB

Disk                : File Cache

In Memory           : Memached
What is Memcache ?
Free  &  open  source,  high­performance,  distributed 
memory  object  caching  system,  generic  in  nature, 
but  intended  for  use  in  speeding  up  dynamic  web 
applications by alleviating database load.

Memcached  is  an  in­memory  key­value  store  for 
small  chunks  of  arbitrary  data  (strings,  objects) 
from results of database calls, API calls, or page 
rendering.

Memcached  is  simple  yet  powerful.  Its  simple 
design  promotes  quick  deployment,  ease  of 
development, and solves many problems facing large 
data caches. Its API is available for most popular 
languages.
Memcache Users

       Faebook
        Naukri
    LiveJournal
      Wikipedia
        Flickr
         Bebo
       Twitter
       Typepad
      Yellowbot
       Youtube
         Digg
   WordPress.com
     Craigslist
         Mixi
Pattern


­ Fetch from cache

­ If there, return

­ Else caclculate, place in cache, return
Program
function get_foo(foo_id)

    foo = memcached_get("foo:" . foo_id)

    return foo if defined foo

    foo = fetch_foo_from_database(foo_id)

    memcached_set("foo:" . foo_id, foo)

    return foo

end
Let's add Memcache to the CODE
GEARMAN ?
MANAGER
Gearmend
­ Daemon that manages the work.

­ Does not do any work.

­ Accetps a job id and a binay payload from 
  Clients

­ Workers keep connections open at all 
  times.
Client

­ Clients connect to Gearmand and ask for 
  work to be done

­ The client can fire and forget or wait on 
  a responses

­ Multiple jobs can be done asynchronously 
  by workers for one client.
Workers


­ A single worker can do just one job or 
  can do many jobs.

­ Does not have to be written using the 
  same language as the workers.
An Example Client
# Create our client object.
$client= new GearmanClient();
 
# Add default server (localhost).
$client­>addServer();
 
echo "Sending jobn";
 
# Send reverse job
$result = $client­>do("reverse", "Hello!");
if ($result) {
  echo "Success: $resultn";
}
An Example Worker
# Create our worker object.
$worker= new GearmanWorker();
 
# Add default server (localhost).
$worker­>addServer();
 
# Register function "reverse" with the server.
$worker­>addFunction("reverse", "reverse_fn");
 
while (1)
{
  print "Waiting for job...n";
  $ret= $worker­>work();
  if ($worker­>returnCode() != GEARMAN_SUCCESS)
    break;
}
 
# A much simple reverse function
function reverse_fn($job)
{
  $workload= $job­>workload();
  echo "Received job: " . $job­>handle() . "n";
  echo "Workload: $workloadn"; 
  $result= strrev($workload);
  echo "Result: $resultn";
  return $result;
}
NOSQL
Database paradigms

* Relational (RDBMS)

* NoSQL
  * Key­value stores
  * Document databases
  * Graph Database

* Others
Relational Databases
* ACID 
   Automicity
   Consistency
   Isolation
   Durability

* SQL

* Mature
NoSQL
* No relational tables

* No fixed tables schemas

* No joins

* No risk, no fun !

* Massive data stores

* Scaling is easy

* Simpler to implement 
Goodbye rows and tables, hello documents and collections
Lots of pretty pictures to fool you.
Noise
Introduction

MongoDB bridges the gap between key-value stores (which are fast and highly scalable) and
traditional RDBMS systems (which provide rich queries and deep functionality).

MongoDB is document-oriented, schema-free, scalable, high-performance, open source. Written in C++

Mongo is not a relational database like MySQL

Goodbye rows and tables, hello documents and collections

Features
Document-oriented


    
      Documents (objects) map nicely to programming language data types
    
      Embedded documents and arrays reduce need for joins
    
      No joins and no multi-document transactions for high performance and easy scalability

 High performance
     
         No joins and embedding makes reads and writes fast
     
         Indexes including indexing of keys from embedded documents and arrays

 High availability
     
         Replicated servers with automatic master failover

 Easy scalability
     
         Automatic sharding (auto-partitioning of data across servers)
           
               Reads and writes are distributed over shards
           
               No joins or multi-document transactions make distributed queries easy and fast
     
         Eventually-consistent reads can be distributed over replicated servers
Why ?

    Cost - MongoDB is free
    MongoDb is easily installable.
    MongoDb supports various programming languages like C, C++, Java,Javascript, PHP.
    MongoDB is blazingly fast
    MongoDB is schemaless
    Ease of scale-out
  If load increases it can be distributed to other nodes across computer networks.
    It's trivially easy to add more fields -- even complex fields -- to your objects.
  So as requirements change, you can adapt code quickly.
    Background Indexing
    MongoDB is a stand-alone server
    Development time is faster, too, since there are no schemas to manage.
    It supports Server-side JavaScript execution.
 Which allows a developer to use a single programming language for both client and server
  side code
Limitations

    Mongo is limited to a total data size of 2GB for all databases in 32-bit mode.

    No referential integrity

    Data size in MongoDB is typically higher.

    At the moment Map/Reduce (e.g. to do aggregations/data analysis) is OK,
 but not blisteringly fast.

    Group By : less than 10,000 keys.
 For larger grouping operations without limits, please use map/reduce .

    Lack of predefined schema is a double-edged sword

    No support for Joins & transactions
Mongo data model

      
       A Mongo system (see deployment above) holds a set of databases
      
       A database holds a set of collections
      
       A collection holds a set of documents
      
       A document is a set of fields
      
       A field is a key-value pair
      
       A key is a name (string)
      
       A value is a
           
              basic type like string, integer, float, timestamp, binary, etc.,
           
              a document, or
           
              an array of values


                                             MySQL Term                Mongo Term


                                             database                  database


                                             table                     collection


                                             index                     index
SQL to Mongo Mapping Chart
Continued ...
       SQL Statement   Mongo Statement
Debugging & Profiling
Debugging & Profiling
Debugging & Profiling
Why & How ?


* Bugs are bad

* Locate issues during runtime

* Speed up issue resolution

* Breakpoints

* Xdebug
Xdebug
  Xdebug  is  a  PHP  extension  that  aims  to 
lend  a  helping  hand  in  the  process  of 
debugging  your  applications.  Xdebug 
offers features like:

    * Automatic stack trace upon error
    * Function call logging
    * Display features such as enhanced 
      var_dump() output and code 
      coverage information
  
  ­ Open Source
  ­ Free
Enabling Xdebug in php.ini


 zend_extension="/usr/lib/php5/20090626+lfs/xdebug.so"
 xdebug.remote_enable=1
 xdebug.remote_host="127.0.0.1"
 xdebug.remote_port=9000
 xdebug.profiler_enable=1
 xdebug.show_local_vars=On
 xdebug.trace_output_dir="/tmp/xprofile/"
 xdebug.trace_output_name= %t.trace
 xdebug.profiler_output_name = %s.%t.profile
 xdebug.profiler_output_dir="/tmp/xprofile/"
Enabling Xdebug in php.ini


 zend_extension="/usr/lib/php5/20090626+lfs/xdebug.so"
 xdebug.remote_enable=1
 xdebug.remote_host="127.0.0.1"
 xdebug.remote_port=9000
 xdebug.profiler_enable=1
 xdebug.show_local_vars=On
 xdebug.trace_output_dir="/tmp/xprofile/"
 xdebug.trace_output_name= %t.trace
 xdebug.profiler_output_name = %s.%t.profile
 xdebug.profiler_output_dir="/tmp/xprofile/"
Enabling Xdebug in php.ini


 zend_extension="/usr/lib/php5/20090626+lfs/xdebug.so"
 xdebug.remote_enable=1
 xdebug.remote_host="127.0.0.1"
 xdebug.remote_port=9000
 xdebug.profiler_enable=1
 xdebug.show_local_vars=On
 xdebug.trace_output_dir="/tmp/xprofile/"
 xdebug.trace_output_name= %t.trace
 xdebug.profiler_output_name = %s.%t.profile
 xdebug.profiler_output_dir="/tmp/xprofile/"
Lucene
Apache  Lucene  is  a  free/open  source 
information  retrieval  software  library, 
originally  created  in  Java  by  Doug 
Cutting.
Scalable, High­Performance Indexing

   * small RAM requirements
   * incremental indexing as fast as batch indexing
   * index size roughly 20­30% the size of text indexed

Powerful, Accurate and Efficient Search Algorithms

   * ranked searching ­­ best results returned first
   * many powerful query types: phrase queries, wildcard 
     queries, proximity queries, range queries and more
   * fielded searching (e.g., title, author, contents)
   * date­range searching
   * sorting by any field
   * multiple­index searching with merged results
   * allows simultaneous update and searching

Cross­Platform Solution

   *  Available  as  Open  Source  software  under  the  Apache 
     License which lets you use Lucene in both commercial   
     and Open Source programs
   * 100%­pure Java
   * Implementations in other programming languages 
     available that are index­compatible
Scalable, High­Performance Indexing

   * small RAM requirements
   * incremental indexing as fast as batch indexing
   * index size roughly 20­30% the size of text indexed

Powerful, Accurate and Efficient Search Algorithms

   * ranked searching ­­ best results returned first
   * many powerful query types: phrase queries, wildcard 
     queries, proximity queries, range queries and more
   * fielded searching (e.g., title, author, contents)
   * date­range searching
   * sorting by any field
   * multiple­index searching with merged results
   * allows simultaneous update and searching

Cross­Platform Solution

   *  Available  as  Open  Source  software  under  the  Apache 
     License which lets you use Lucene in both commercial   
     and Open Source programs
   * 100%­pure Java
   * Implementations in other programming languages 
     available that are index­compatible
Scalable, High­Performance Indexing

   * small RAM requirements
   * incremental indexing as fast as batch indexing
   * index size roughly 20­30% the size of text indexed

Powerful, Accurate and Efficient Search Algorithms

   * ranked searching ­­ best results returned first
   * many powerful query types: phrase queries, wildcard 
     queries, proximity queries, range queries and more
   * fielded searching (e.g., title, author, contents)
   * date­range searching
   * sorting by any field
   * multiple­index searching with merged results
   * allows simultaneous update and searching

Cross­Platform Solution

   *  Available  as  Open  Source  software  under  the  Apache 
     License which lets you use Lucene in both commercial   
     and Open Source programs
   * 100%­pure Java
   * Implementations in other programming languages 
     available that are index­compatible
Scalable, High­Performance Indexing

   * small RAM requirements
   * incremental indexing as fast as batch indexing
   * index size roughly 20­30% the size of text indexed

Powerful, Accurate and Efficient Search Algorithms

   * ranked searching ­­ best results returned first
   * many powerful query types: phrase queries, wildcard 
     queries, proximity queries, range queries and more
   * fielded searching (e.g., title, author, contents)
   * date­range searching
   * sorting by any field
   * multiple­index searching with merged results
   * allows simultaneous update and searching

Cross­Platform Solution

   *  Available  as  Open  Source  software  under  the  Apache 
     License which lets you use Lucene in both commercial   
     and Open Source programs
   * 100%­pure Java
   * Implementations in other programming languages 
     available that are index­compatible
Scalable, High­Performance Indexing

                          Pitfalls
   * small RAM requirements
   * incremental indexing as fast as batch indexing
   * index size roughly 20­30% the size of text indexed

Powerful, Accurate and Efficient Search Algorithms
    * Update = Delete + Add
   * ranked searching ­­ best results returned first
   * many powerful query types: phrase queries, wildcard 
    * No Partial document update
     queries, proximity queries, range queries and more
   * fielded searching (e.g., title, author, contents)
   * date­range searching
    * No Joins
   * sorting by any field
   * multiple­index searching with merged results
   * allows simultaneous update and searching

Cross­Platform Solution

   *  Available  as  Open  Source  software  under  the  Apache 
     License which lets you use Lucene in both commercial   
     and Open Source programs
   * 100%­pure Java
   * Implementations in other programming languages 
     available that are index­compatible
Scalable, High­Performance Indexing

   * small RAM requirementsCode: FS Indexer
   * incremental indexing as fast as batch indexing
   * index size roughly 20­30% the size of text indexed
    private IndexWriter writer;
Powerful, Accurate and Efficient Search Algorithms
   public Indexer(String indexDir) throws IOException {
      Directory dir = FSDirectory.open(new File(indexDir));
    * ranked searching ­­ best results returned first
      writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_CURRENT), true,
       IndexWriter.MaxFieldLength.UNLIMITED);
    * many powerful query types: phrase queries, wildcard 
    }
     queries, proximity queries, range queries and more
   * fielded searching (e.g., title, author, contents)
   public void close() throws IOException {
   * date­range searching
     writer.close();
   }
   * sorting by any field
   * multiple­index searching with merged results
   public void index(String dataDir, FileFilter filter) throws Exception {
   * allows simultaneous update and searching
     File[] files = new File(dataDir).listFiles();
      for (File f: files) {
        Document doc = new Document();
Cross­Platform Solution
        doc.add(new Field("contents", new FileReader(f)));
        doc.add(new Field("filename", f.getName(),
    *  Available  as  Open  Source  software  under  the  Apache 
                   Field.Store.YES, Field.Index.NOT_ANALYZED));
     License which lets you use Lucene in both commercial   
        writer.addDocument(doc);
    }
     and Open Source programs
  }
   * 100%­pure Java
   * Implementations in other programming languages 
     available that are index­compatible
Code: Searcher
public void search(String indexDir, String q) throws IOException,
  ParseException {
 Directory dir = FSDirectory.open(new File(indexDir));
 IndexSearcher is = new IndexSearcher(dir, true);

    QueryParser parser = new QueryParser("contents",
                                new
    StandardAnalyzer(Version.LUCENE_CURRENT));
    Query query = parser.parse(q);
    TopDocs hits = is.search(query, 10);
    System.err.println("Found " + hits.totalHits + " document(s)");

    for (int i=0; i<hits.scoreDocs.length; i++) {
      ScoreDoc scoreDoc = hits.scoreDocs[i];
      Document doc = is.doc(scoreDoc.doc);
      System.out.println(doc.get("filename"));
    }

    is.close();
}
Open source Technology

Open source Technology