Xin (Sean) Hao
401 Shady Avenue, Kenmawr Apartment B607, Pittsburgh, PA 15206
Tel: (412) 708-4337·Email: firstname.lastname@example.org
Carnegie Mellon University, Pittsburgh, USA May. 2015
Master of Information Systems Management GPA: 3.88/4.33
Sun Yat-sen University, Guangzhou, China June. 2013
Bachelor Degree of Software Engineering GPA: 3.8/4.0
►SingTel, Singapore May - Aug. 2014
Data Analyst (Scientist) Intern, Dataspark Team
Datamart Project – Large-scale Data Aggregation, Architecture Designer, Primary Developer
Design and implement a very important computing engine for geo-location data insight product of SingTel DataSpark team.
Aggregate users’ demographic data with telcom-geolocation data and then integrate information based on subzones of map.
Decreased total time spent from 7 hours to 1 hours using Hadoop (mapR and Cloudera distribution) for one-day data (billion
records) and saved 300 hours in total for entire data calculation.
►Tencent, Shenzhen, China July - Sept. 2012
Backend Software Engineer Intern, Data Analysis Team
DataTrans Project – Data Pipeline, Architecture Designer, Main Developer
Designed and implemented a data migration and processing tool using Hadoop to solve TB level data problem.
Fetched data from MySQL and transferred it to Hadoop file system. Then stored the result in HBase DBMS.
Build the data pipeline among HDFS, MySQL and HBase. Practiced JAVA, Hadoop based programming, HBase and SQL
skills in this project.
Twitter Data Analysis Web Server on AWS Platform, Pittsburgh, USA Sept. 2014 – Dec. 2014
Analyzed 1TB tweets data using AWS Elastic Map-reduce system about user relationship, retweet, location and hashtags.
Imported data into MySQL and scalable HBase DBMS. Optimized the database to accelerate query response speed.
Built web servers using JBoss framework on Amazon EC2 system with Elastic Load Balance to resist compression test.
With help of partitioning, caching and load balancing techniques, our web servers are able to handle more than 10,000
queries per second and secure 100% correctness.
Search Engine Implementation on ClueWeb09 Dataset, Pittsburgh, USA Jan. 2015 -- Present
Design and implement query parser to convert general query strings to search engine operations and terms.
Search in more than 3GB corpus through inverted list and score list using Ranked/Unranked Boolean, BM25 and Indri
algorithms to achieve high precision and recall rate document searching in seconds.
Using OO design patterns like Factory Patterns to reuse code and keep tight structure.
TwittStory Mobile App, Adelaide, Australia Sept – Dec. 2013
Development Team, Developer
Developed a multiple platform supported web based mobile app for http://www.twittstory.com .
Implemented login, search, fetching data and visual display features for this Twitter analysing mobile app.
Interested Fields Hadoop, Cloud Platform, Large-scale Data Mining, Machine Learning
Language English, Chinese