In the context of this thesis, we worked on mining idioms from repositories. With the term idiom, we mean a small fragment of code that recurs in repositories and has a specific semantic purpose. Idioms are characterized by their readability and reusability to perform a specific task. Experienced developers aspire to write idiomatic code, which leads to better performance and easier maintenance applications. The importance of idioms is being realized as integrated development environments such as Eclipse and IntelliJ have specific tools that offer idioms to users. Our research to address the problem of the automatic idioms extraction is focused on clustering snippets of code from high level software projects. These projects were extracted from the version control system, GitHub based on their popularity. For the representation of the snippets, Abstract Syntax Trees have been used that retain both the structural information of the code and the semantic information with variables and methods names. The comparison of the source code fragments was performed with the pq-grams algorithm which is a method of measuring the distance of trees. Then the most representative code snippets of the resulting clusters were converted to a generalized format retaining the semantic content of the code. The results from the above procedure were evaluated based on a test set and were very encouraging.