projects_with_descriptions

- projects completed
- Helped with POC to determine which Hadoop Environment we'd use (Cloudera vs
Hortonworks)
- Did perfomance testing
- Applied 2 business cases and saw performance differences with BI Tools
(Tableau)
- Google Adwords to sql server automation with python
- wrote custom libraries that pulled google adwords data into sql server
- saved marketing 2 hours of work every day to pull data manually for
reports
- SQL Server to Google Big Query
- pulled various data sets with custom python libraries from sql server to
google big query
- Google Calendar to sql server for PTI tracking tool
- created calendar that employees put their pto data into and pulled the data
down using google apis
- helped track pto for employees/departments
- SFTP to SQL Server (paylocity and radius HR data)
- pulled data from sftp servers with python and loaded them to sql server
then to big query
- Relational DB to Hadoop library (FeastMode)
- wrote python library that pulled data from relational databases (sql server,
mysql, etc..) to hadoop through configurations and scanning the metadata of the
databases
- saved time as pulling a whole database would be as simple as typing 1 line
of code and it would load into hadoop in parquet formatted and snappy compressed
impala tables for reporting.
- Salesforce Python Library
- wrote a custom python library that used the Salesforce Bulk REST api that
used configurations to pull data from salesforce into hadoop and then eventually to
the legacy sql server system
- also had an incremental loading feature which sped up the data load as well
as made intraday(hourly) reporting possible
- used lambda architure to make incremental loads possible in hive
- saved the company from paying an extra $1200 a year for each license of DB
Amp
- there was no 3rd party tool to pull data from salesforce into hadoop

- Zuora Python Library
- wrote a custom python library that used the Zuora REST api that used
configurations to pull data from zuora into hadoop
- also had an incremental loading feature which sped up the data load as well
as made intraday(hourly) reporting possible
- used lambda architure to make incremental loads possible in hive
- there was no 3rd party tool to pull data from salesforce into hadoop
- project was done because legacy system wasn't working correctly
- Consuming Rabbit MQ messages
- built python libraries that read data and stored them into a reportable
format, or used messaging to trigger jobs to kick off
- example: live offer code redemptions for real time reporting.
- Built Company KPI's
- worked closely with finance to build company api's
- used mostly local datasets and netsuite data
- eventually included zuora and salesforce data
- used complex sql in Impala
- Built Company Billings dataset
- worked closely with marketing/finance to build billings data set that would
be the source of truth for financial reporting for the company
- used salesforce and zuora data that was pulled with python libraries as well
as netsuite data
- Built Subscriber Snapshot dataset
- with direction from the data science team, helped build a subscriber
snapshot data set that had a daily and monthly granularity.
- had users subscription info, usage, profile, and demographics. Which then
segmented the subscribers.
- Set Up Airflow workflow tool and used it for job automation
- set up airfow on linux machine and wrote custom DAGs using python
scripts for job scheduling.
- Used Cron for job scheduling
- used cron in linux to schedule jobs (eventually moved jobs to airflow)
- S3 to Hadoop
- pulled various data from amazon S3 and loaded the data into hadoop
- could be from the dev product team or a different part of the company
(Code School, Digital Tutors, etc.)

- Searchlight, Wootric, Desk and other Rest API's with Python
- created custom python libraries that used REST API's from 3rd party
vendors and loaded them to hadoop for reporting purposes
- Usually JSON responses
- Used External Definitions on top of json files to create tables
- used views with complex logic to extract data necessary (lateral view
explodes, etc.)
in a reportable format in Impala
- Kafka to hadoop and hadoop to kafka python libraries
- changed configs to let data from Kafka dispatchers go to hdfs
- then loaded those records into hadoop
- created dynamic dsl code that sent data from hdfs folders into kafka topics
- Created company wide hourly office fitness slack channel automated with python
- for a hackday project created a python library that used the slack REST apis
that notified employees to get out of their seats and do an hourly workout.
- it let them self check in if they did the workout or not and there was a
tableau dashboard that tracked the company's progress
- Created python library that used slack for etl alerts
- created python library that notified if jobs failed via slack by using slack
webhook

projects_with_descriptions

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to projects_with_descriptions

Similar to projects_with_descriptions (20)

projects_with_descriptions