Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
DATA SCIENCE
DATA ENGINEERS
DATA SOLUTIONS
Think&Big&Start&Smart&Scale&Fast
Eliano Marques-– Senior-Data-Scientist
Martin-...
CONFIDENTIAL+ +++++++| 2© 2015 Think Big, a Teradata Company
Think+Big+History
1st
SI+Solution+Provider+with+100%+focus+on...
CONFIDENTIAL+ +++++++| 3© 2015 Think Big, a Teradata Company
Think-Big-Clients
Trusted&Analytics&Services&Provider&to&the&...
CONFIDENTIAL+ +++++++| 4© 2015 Think Big, a Teradata Company
Think+Big+VELOCITY Methodology
Big+Data
Strategy
Think+Big
Ac...
CONFIDENTIAL+ +++++++| 5© 2015 Think Big, a Teradata Company
What+is+Apache+Spark?+
• Open+source+Apache+project
− Paralle...
CONFIDENTIAL+ +++++++| 6© 2015 Think Big, a Teradata Company
Apache-Spark-Core-Engine
Spark-SQL
Spark-
Streaming
MLib
(Mac...
CONFIDENTIAL+ +++++++| 7© 2015 Think Big, a Teradata Company
Data+Science+Approaches
7
Single-Workstation
- Small+data+set...
CONFIDENTIAL+ +++++++| 8© 2015 Think Big, a Teradata Company
Data-Lake-(HDFS)
Core-Data-ScienceProduction
• Dashboards
• R...
CONFIDENTIAL+ +++++++| 9© 2015 Think Big, a Teradata Company
Project-KickVoff
Data-Profiling-
and-Exploratory-
Analysis
An...
CONFIDENTIAL+ +++++++| 10© 2015 Think Big, a Teradata Company
We+leverage+our+expertise+across+industries
Dynamic-Pricing
...
CONFIDENTIAL+ +++++++| 11© 2015 Think Big, a Teradata Company
Thank+you
Upcoming SlideShare
Loading in …5
×

Today’s reality Hadoop with Spark- How to select the best Data Science approach when using Big Data Platforms and Technologies?

618 views

Published on

Martin Oberhuber and Eliano Marques, Senior Data Scientists @Think Big International

In this talk Think Big International Lead Data Scientists will discuss the options that exist today for engineering and data science teams aiming to use big data patterns to solve new business problems. With the enterprise adoption of the Hadoop ecosystem and the emerging momentum of open source projects like Spark it is becoming mandatory to have an approach that solves for business results but remains flexible to adapt and change with the open source market.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Today’s reality Hadoop with Spark- How to select the best Data Science approach when using Big Data Platforms and Technologies?

  1. 1. DATA SCIENCE DATA ENGINEERS DATA SOLUTIONS Think&Big&Start&Smart&Scale&Fast Eliano Marques-– Senior-Data-Scientist Martin-Oberhuber-– Senior-Data-Scientist
  2. 2. CONFIDENTIAL+ +++++++| 2© 2015 Think Big, a Teradata Company Think+Big+History 1st SI+Solution+Provider+with+100%+focus+on+open+source+ and+Big+Data+Hadoop ecosystem • 100++Successful+Programs • 70++Clients • Global+Delivery+Capabilities • We-are-hiring
  3. 3. CONFIDENTIAL+ +++++++| 3© 2015 Think Big, a Teradata Company Think-Big-Clients Trusted&Analytics&Services&Provider&to&the&Fortune&1000 eCommerce 2+of+Global+Top+5 Internet-Transaction-Security Global #1 Retail 2+of+Global+Top+5 Brokerage &-Mutual-Funds 2+of+Global+Top+5 Social-Networking Global #1 Asset-Management Global #1 Credit-Issuer 2+of Global+Top+5 Semiconductor 2+of+Global Top+5 Banking 4+of+Global Top+10 Data Storage-Devices 3+of+Global Top+5 Financial Data-Services 2+of+Global+Top+5 Disk Manufacturing Global+#1 Financial-Exchanges Global #2 Telecommunications 2+of+Global Top+5 Media-& Advertising 2+of+Global+Top+5
  4. 4. CONFIDENTIAL+ +++++++| 4© 2015 Think Big, a Teradata Company Think+Big+VELOCITY Methodology Big+Data Strategy Think+Big Academy Big+Data Program+Mgt Business Analytics Managed+ Services Data+ Engineering Big+Data+Lab Think+Big+engages+with+it’s+client’s+business,+technical,+analyst+and+support+teams+in+ an+agile+inspired+VELOCITY+Methodology+to+continuously+develop+Big+Data+solutions+
  5. 5. CONFIDENTIAL+ +++++++| 5© 2015 Think Big, a Teradata Company What+is+Apache+Spark?+ • Open+source+Apache+project − Parallel+middleware+for+server+ clusters − Spark.apache.org+(2014) • Developed+by+UC+Berkeley’s+ AMPLab − Supported+by+Databricks • Top+use+cases − SQLaonaHadoop − Machine+learning − Streaming+data+miniabatches
  6. 6. CONFIDENTIAL+ +++++++| 6© 2015 Think Big, a Teradata Company Apache-Spark-Core-Engine Spark-SQL Spark- Streaming MLib (Machine-learning) GraphX (Graph) Scala,-R-(SparkR),-Python-(PySpark) What+is+Apache+Spark?+
  7. 7. CONFIDENTIAL+ +++++++| 7© 2015 Think Big, a Teradata Company Data+Science+Approaches 7 Single-Workstation - Small+data+sets - No+distributed+analytics+ across+multiple+nodes - Powerful+tools+are+R+or+ Python - Data+Scientist+can+focus+on+ business+problem Mixed Single/Workstation/+/Cluster - Small+or+large+data+sets - Data+wrangling+and+feature+ engineering+is+performed+on+ cluster - Predictive+analysis+and+ modeling+can+be+performed+on+ single+workstation - Powerful+tools+are+Hadoop Streaming+and+Spark combined+with+R+and+Python - Data+Scientist+now+have+to+ worry+about+parallelisation of+ some+data+mining+tasks+ (ususally the+ones+that+are+ embarrassingly+parallel) Cluster - Large+data+sets - Both+data+wrangling+and+ modeling+is+performed+on+ cluster - Spark+is+one+of+the+few+tools+ that+support+efficient+parallel+ machine+learning - Parallelising machine+learning+ algorithms+is+challenging
  8. 8. CONFIDENTIAL+ +++++++| 8© 2015 Think Big, a Teradata Company Data-Lake-(HDFS) Core-Data-ScienceProduction • Dashboards • R+Shiny+Apps • Predictive+model+ scoring Plug+&+play+model+deployment Data-Sources- (Operations,+ Sales,+ marketing,+etc) Ingestion Realatime+ Optimization+with+ Multiaarmed+Bandit Data • Integration+of+R+and+ Python+with+Hadoop and+ Spark • Leveraging+computing+ power+of+Hadoop cluster+ for+distributed+analytics • Plug+&+play+model+ deployment+tools+for+ easy+and+robust+ productionising of+ analytics+models Realatime+Data Productionising Analytics
  9. 9. CONFIDENTIAL+ +++++++| 9© 2015 Think Big, a Teradata Company Project-KickVoff Data-Profiling- and-Exploratory- Analysis Analytics- Modeling Model-Validation Model-Publishing Reporting Data-Science-Project Data+Science+and+Analytics+Overview
  10. 10. CONFIDENTIAL+ +++++++| 10© 2015 Think Big, a Teradata Company We+leverage+our+expertise+across+industries Dynamic-Pricing Fraud-Detection Customer-Segmentation Recommendation- Engine Predictive-Asset- Maintenance Proactive- Customer- Support Credit-Default- Prediction Churn-Modeling Scenario-Simulation A/B-Testing Display-Targeting-Optimisation Demand-Forecast Cluster-Analysis-&- Segmentation Device-Analytics Risk-Analytics Customer-Analytics
  11. 11. CONFIDENTIAL+ +++++++| 11© 2015 Think Big, a Teradata Company Thank+you

×