SlideShare a Scribd company logo
Big Data Architecture and Cluster
Optimization with Python
By: Chetan Khatri
Principal Big Data Engineer, Nazara Technologies.
Data Science & Machine Learning Curricula Advisor,
University of Kachchh, Gujarat.
Pycon India 2016
Data Analytics Cycle
l Understand the Business
l Understand the Data
l Cleanse the Data
l Do Analytics the Data
l Predict the Data
l Visualize the data
l Build Insight that helps to grow Business Revenue
l Explain to Executive (CxO)
l Take Decision
l Increase Revenue
Capacity Planning (Cluster Sizing)
lTelecom Business:
l122 Operators , 4 Region(INDIA, Africa, ME, Latin America.
l12 TB of Data per Year
l11,00,000 Transactions per day.
lGaming Business:
l6 Billion events per month = (near by) 15 TB of Data per year.
lTotal: 27 TB of Data per year
Predictive Modeling Cycle
1. Data Quality (Removing Noisy, Missing Data)
2. Feature Engineering
3. Choosing Best Model: " based on culture of Data, For ex. If continues data-
points go with Linear Regression , If categorical binomial prediction requires
then go with Logistic Regression, For Random sample of data(Feature
randomization) and have better generalization performance. other like Gradient
Boosting Trees for optimal linear combination of trees and weighted sum of
predictions of individual trees."
Try from Linear Regression to Deep Learning (RNN, CNN)
4. Ensemble Model (Regression + Random Forest + XGBoost)
5. Tune Hyper-parameters(For ex in Deep Neural Network, Needs to tune mini-
batch size, learning rate, epoch, hidden layers)
6. Model Compression - Port model to embedded / mobile devices using
Compress matrices(Sparsify, Shrink, Break, Quantize)
7. Run on smart-phone
Big Data Cluster Tuning – OS Parameters
TPS (Transaction Per Second) - throughput for every Jobs.
Time Wait Interval - TCP - For ex. 4 min
Max.port
max.connection
sysctl net.ipv4.ip_local_port_range
sysctl net.ipv4.tcp_fin_timeout
Max Thread - sysctl -a | grep threads_max
echo 120000 > /proc/sys/kernal/threads_max
echo 600000 > /proc/sys
cat /proc/sys/kernal/threads_max
Number of Thread = Total Virtual Memory / (Stacksize * 1024 * 2024)
java.lang.OutOfMemoryError: Java
heap space !
lList Ram: free -m
lStorage: df -h
lulimit -s // Stack memory
lulimit -v // Virtual Memory
lecho 120000 > /proc/sys/kernal/threads_max
lecho 600000 > /proc/sys/kernal/max_map_count
lecho 200000 > /proc/sys/kernal/pid_max
Virtual Memory Configuration – swap
configuration
lsudo fallocate -l 20G /swapfile
lsudo chmod 600 /swapfile
lsudo mkswap /swapfile
lsudo swapon /swapfile
lsudo swapon -s
lsudo nano /etc/fstab
l/swapfile none swap sw 0 0
Maximum number of open files
lulimit -n
lsudo nano /etc/security/limits.conf
l* soft nofile 64000
l* hard nofile 64000
lroot soft nofile 64000
lroot hard nofile 64000
lsudo nano /etc/pam.d/common-session
lsession required pam_limits.so
lsudo nano /etc/pam.d/common-session-noninteractive
lsession required pam_limits.so
Big Data Optimization: Tune kafka Cluster
lbuffer.memory: default
lbatch.size: "655357"
llinger.ms: "5"
lcompression.type: lz4
lretries: default
lsend.buffer.bytes: default
lconnections.max.idle.ms: default
lbootstrap.servers
lbatch.size
llinger.ms
lconnections.max.idle.ms = 10000
lcompression.type
lretries
Spark Cluster Hyper parameter Tuning
l1) ./spark-shell --conf
l--conf spark.executor.memory=50g
l--conf spark.driver.memory=150g
l--conf spark.kryoserializer.buffer.max=256
l--conf spark.driver.maxResultSize=1g
l--conf spark.dynamicAllocation.enabled=true
l--conf spark.shuffle.service.enabled=true
l--conf spark.rpc.askTimeout=300s
l--conf spark.dynamicAllocation.minExecutors=5
l--conf spark.sql.shuffle.partitions=1024
Spark Cluster Hyper parameter Tuning
l2) Configuration in spark-defaults.conf at /usr/local/spark-1.6.1/conf
Spark Cluster Hyper parameter Tuning
lspark.master spark://master.prod.chetan.com:7077
lspark.serializer org.apache.spark.serializer.KryoSerializer
lspark.eventLog.enabled true
lspark.history.fs.logDirectory file:/data/tmp/spark-events
l#spark.eventLog.dir=hdfs://namenode_host:namenode_port/user/spark/applicationHistor
y4
lspark.eventLog.dir file:/data/tmp/spark-events
lPySpark with Hadoop Demo
PySpark with Hadoop Demo- MapReduce with
wordcount
l>>> textFile = sc.textFile("file:///home/chetan306/inputfile.txt")
l>>> textFile.count()
l>>> textFile.first()
l>>> wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda
word: (word, 1)).reduceByKey(lambda a, b: a+b)
l>>> wordCounts.collect()
Data Science in University Education Initiative
lData Science Lab, Computer Science Department – University of
Kachchh.
Data Science in University Education Initiative
l- Machine learning / Data Science with Python
Questions ?
Resources
https://github.com/dskskv/pycon-india-2016
chetan@kutchuni.edu.in
Twitter: @khatri_chetan

More Related Content

What's hot

Temporal Pattern Mining
Temporal Pattern MiningTemporal Pattern Mining
Temporal Pattern Mining
Prakhar Dhama
 
End of Sprint 5
End of Sprint 5End of Sprint 5
End of Sprint 5
dm_work
 
View, Act, and React: Shaping Business Activity with Analytics, BigData Queri...
View, Act, and React: Shaping Business Activity with Analytics, BigData Queri...View, Act, and React: Shaping Business Activity with Analytics, BigData Queri...
View, Act, and React: Shaping Business Activity with Analytics, BigData Queri...
Srinath Perera
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software Framework
ThoughtWorks
 
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
DataWorks Summit
 
The next generation of the Montage image mosaic engine
The next generation of the Montage image mosaic engineThe next generation of the Montage image mosaic engine
The next generation of the Montage image mosaic engine
G. Bruce Berriman
 
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATLParikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
MLconf
 
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Austin Benson
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Databricks
 
Scientific Computation on JRuby
Scientific Computation on JRubyScientific Computation on JRuby
Scientific Computation on JRuby
Prasun Anand
 
CNS_poster12
CNS_poster12CNS_poster12
CNS_poster12
Apurva Kumar
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and Distributed
Turi, Inc.
 
Training Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in SparkTraining Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in Spark
Patrick Pletscher
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
Bill Liu
 
As simple as Apache Spark
As simple as Apache SparkAs simple as Apache Spark
As simple as Apache Spark
Data Science Warsaw
 
ML6 talk at Nexxworks Bootcamp
ML6 talk at Nexxworks BootcampML6 talk at Nexxworks Bootcamp
ML6 talk at Nexxworks Bootcamp
Karel Dumon
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
Databricks
 
AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...
AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...
AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...
GeeksLab Odessa
 
Beyond Kaggle: Solving Data Science Challenges at Scale
Beyond Kaggle: Solving Data Science Challenges at ScaleBeyond Kaggle: Solving Data Science Challenges at Scale
Beyond Kaggle: Solving Data Science Challenges at Scale
Turi, Inc.
 
Scientific computing on jruby
Scientific computing on jrubyScientific computing on jruby
Scientific computing on jruby
Prasun Anand
 

What's hot (20)

Temporal Pattern Mining
Temporal Pattern MiningTemporal Pattern Mining
Temporal Pattern Mining
 
End of Sprint 5
End of Sprint 5End of Sprint 5
End of Sprint 5
 
View, Act, and React: Shaping Business Activity with Analytics, BigData Queri...
View, Act, and React: Shaping Business Activity with Analytics, BigData Queri...View, Act, and React: Shaping Business Activity with Analytics, BigData Queri...
View, Act, and React: Shaping Business Activity with Analytics, BigData Queri...
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software Framework
 
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
 
The next generation of the Montage image mosaic engine
The next generation of the Montage image mosaic engineThe next generation of the Montage image mosaic engine
The next generation of the Montage image mosaic engine
 
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATLParikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
 
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
Data Structures and Performance for Scientific Computing with Hadoop and Dumb...
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
 
Scientific Computation on JRuby
Scientific Computation on JRubyScientific Computation on JRuby
Scientific Computation on JRuby
 
CNS_poster12
CNS_poster12CNS_poster12
CNS_poster12
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and Distributed
 
Training Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in SparkTraining Large-scale Ad Ranking Models in Spark
Training Large-scale Ad Ranking Models in Spark
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
 
As simple as Apache Spark
As simple as Apache SparkAs simple as Apache Spark
As simple as Apache Spark
 
ML6 talk at Nexxworks Bootcamp
ML6 talk at Nexxworks BootcampML6 talk at Nexxworks Bootcamp
ML6 talk at Nexxworks Bootcamp
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
 
AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...
AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...
AI&BigData Lab 2016. Руденко Петр: Особенности обучения, настройки и использо...
 
Beyond Kaggle: Solving Data Science Challenges at Scale
Beyond Kaggle: Solving Data Science Challenges at ScaleBeyond Kaggle: Solving Data Science Challenges at Scale
Beyond Kaggle: Solving Data Science Challenges at Scale
 
Scientific computing on jruby
Scientific computing on jrubyScientific computing on jruby
Scientific computing on jruby
 

Viewers also liked

Internet of things initiative-cskskv
Internet of things   initiative-cskskvInternet of things   initiative-cskskv
Internet of things initiative-cskskv
Chetan Khatri
 
Alumni talk-university-of-kachchh
Alumni talk-university-of-kachchhAlumni talk-university-of-kachchh
Alumni talk-university-of-kachchh
Chetan Khatri
 
Data science bootcamp day1
Data science bootcamp day1Data science bootcamp day1
Data science bootcamp day1
Chetan Khatri
 
Pycon india-2016-success-story
Pycon india-2016-success-storyPycon india-2016-success-story
Pycon india-2016-success-story
Chetan Khatri
 
Data Analytics with Pandas and Numpy - Python
Data Analytics with Pandas and Numpy - PythonData Analytics with Pandas and Numpy - Python
Data Analytics with Pandas and Numpy - Python
Chetan Khatri
 
SinterklaasbezoektEemland.nl
SinterklaasbezoektEemland.nl SinterklaasbezoektEemland.nl
SinterklaasbezoektEemland.nl
nikolaistomp
 
Apple notification push
Apple notification pushApple notification push
Apple notification push
Jonathan RAMIER
 
Учням — цікаво, вчителям — легко. Немає нічого більш упорядкованого, ніж природа
Учням — цікаво, вчителям — легко. Немає нічого більш упорядкованого, ніж природаУчням — цікаво, вчителям — легко. Немає нічого більш упорядкованого, ніж природа
Учням — цікаво, вчителям — легко. Немає нічого більш упорядкованого, ніж природа
Електронні книги Ранок
 
Data science bootcamp day2
Data science bootcamp day2Data science bootcamp day2
Data science bootcamp day2
Chetan Khatri
 
Data science bootcamp day 3
Data science bootcamp day 3Data science bootcamp day 3
Data science bootcamp day 3
Chetan Khatri
 
Think Machine Learning with Scikit-Learn (Python)
Think Machine Learning with Scikit-Learn (Python)Think Machine Learning with Scikit-Learn (Python)
Think Machine Learning with Scikit-Learn (Python)
Chetan Khatri
 
State of Search: Local SEO
State of Search: Local SEOState of Search: Local SEO
State of Search: Local SEO
DFWSEM
 
Design & Building Smart Energy Meter
Design & Building Smart Energy MeterDesign & Building Smart Energy Meter
Design & Building Smart Energy Meter
Chetan Khatri
 
Ronny pita
Ronny pitaRonny pita
Ronny pita
PITACATUTO
 
Lindquist and willard tips and tools for electronic feedback
Lindquist and willard  tips and tools for electronic feedbackLindquist and willard  tips and tools for electronic feedback
Lindquist and willard tips and tools for electronic feedback
Minnesota English Learner Education Conference
 
Romance powerpoint
Romance powerpointRomance powerpoint
Romance powerpoint
betha2media
 
LEÇON 127 – Il n’est d’amour que celui de Dieu.
LEÇON 127 – Il n’est d’amour que celui de Dieu.LEÇON 127 – Il n’est d’amour que celui de Dieu.
LEÇON 127 – Il n’est d’amour que celui de Dieu.
Pierrot Caron
 
Mart6ha
Mart6haMart6ha

Viewers also liked (18)

Internet of things initiative-cskskv
Internet of things   initiative-cskskvInternet of things   initiative-cskskv
Internet of things initiative-cskskv
 
Alumni talk-university-of-kachchh
Alumni talk-university-of-kachchhAlumni talk-university-of-kachchh
Alumni talk-university-of-kachchh
 
Data science bootcamp day1
Data science bootcamp day1Data science bootcamp day1
Data science bootcamp day1
 
Pycon india-2016-success-story
Pycon india-2016-success-storyPycon india-2016-success-story
Pycon india-2016-success-story
 
Data Analytics with Pandas and Numpy - Python
Data Analytics with Pandas and Numpy - PythonData Analytics with Pandas and Numpy - Python
Data Analytics with Pandas and Numpy - Python
 
SinterklaasbezoektEemland.nl
SinterklaasbezoektEemland.nl SinterklaasbezoektEemland.nl
SinterklaasbezoektEemland.nl
 
Apple notification push
Apple notification pushApple notification push
Apple notification push
 
Учням — цікаво, вчителям — легко. Немає нічого більш упорядкованого, ніж природа
Учням — цікаво, вчителям — легко. Немає нічого більш упорядкованого, ніж природаУчням — цікаво, вчителям — легко. Немає нічого більш упорядкованого, ніж природа
Учням — цікаво, вчителям — легко. Немає нічого більш упорядкованого, ніж природа
 
Data science bootcamp day2
Data science bootcamp day2Data science bootcamp day2
Data science bootcamp day2
 
Data science bootcamp day 3
Data science bootcamp day 3Data science bootcamp day 3
Data science bootcamp day 3
 
Think Machine Learning with Scikit-Learn (Python)
Think Machine Learning with Scikit-Learn (Python)Think Machine Learning with Scikit-Learn (Python)
Think Machine Learning with Scikit-Learn (Python)
 
State of Search: Local SEO
State of Search: Local SEOState of Search: Local SEO
State of Search: Local SEO
 
Design & Building Smart Energy Meter
Design & Building Smart Energy MeterDesign & Building Smart Energy Meter
Design & Building Smart Energy Meter
 
Ronny pita
Ronny pitaRonny pita
Ronny pita
 
Lindquist and willard tips and tools for electronic feedback
Lindquist and willard  tips and tools for electronic feedbackLindquist and willard  tips and tools for electronic feedback
Lindquist and willard tips and tools for electronic feedback
 
Romance powerpoint
Romance powerpointRomance powerpoint
Romance powerpoint
 
LEÇON 127 – Il n’est d’amour que celui de Dieu.
LEÇON 127 – Il n’est d’amour que celui de Dieu.LEÇON 127 – Il n’est d’amour que celui de Dieu.
LEÇON 127 – Il n’est d’amour que celui de Dieu.
 
Mart6ha
Mart6haMart6ha
Mart6ha
 

Similar to Pycon 2016-open-space

Alto Desempenho com Java
Alto Desempenho com JavaAlto Desempenho com Java
Alto Desempenho com Java
codebits
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Databricks
 
Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsPetascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big Analytics
Heiko Joerg Schick
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01
Giridhar Addepalli
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuning
Yosuke Mizutani
 
Deep learning for FinTech
Deep learning for FinTechDeep learning for FinTech
Deep learning for FinTech
geetachauhan
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
Martin Zapletal
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafka
Nitin Kumar
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalProject Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare Metal
Databricks
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Databricks
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent Monitoring
Intelie
 
Artificial intelligence - A Teaser to the Topic.
Artificial intelligence - A Teaser to the Topic.Artificial intelligence - A Teaser to the Topic.
Artificial intelligence - A Teaser to the Topic.
Dr. Kim (Kyllesbech Larsen)
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
John Zedlewski
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
E-Commerce Brasil
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
Sagar Dolas
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe Conference
LEGATO project
 
Advertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileAdvertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-Mobile
Databricks
 
Strata + Hadoop 2015 Slides
Strata + Hadoop 2015 SlidesStrata + Hadoop 2015 Slides
Strata + Hadoop 2015 Slides
Jun Liu
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at Helixa
Alluxio, Inc.
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData
 

Similar to Pycon 2016-open-space (20)

Alto Desempenho com Java
Alto Desempenho com JavaAlto Desempenho com Java
Alto Desempenho com Java
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
 
Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsPetascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big Analytics
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuning
 
Deep learning for FinTech
Deep learning for FinTechDeep learning for FinTech
Deep learning for FinTech
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafka
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalProject Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare Metal
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent Monitoring
 
Artificial intelligence - A Teaser to the Topic.
Artificial intelligence - A Teaser to the Topic.Artificial intelligence - A Teaser to the Topic.
Artificial intelligence - A Teaser to the Topic.
 
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020GPU Accelerated Data Science with RAPIDS - ODSC West 2020
GPU Accelerated Data Science with RAPIDS - ODSC West 2020
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe Conference
 
Advertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileAdvertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-Mobile
 
Strata + Hadoop 2015 Slides
Strata + Hadoop 2015 SlidesStrata + Hadoop 2015 Slides
Strata + Hadoop 2015 Slides
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at Helixa
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
 

More from Chetan Khatri

Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Chetan Khatri
 
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
Chetan Khatri
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
Chetan Khatri
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
Chetan Khatri
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
Chetan Khatri
 
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_productionPyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
Chetan Khatri
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Chetan Khatri
 
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
Chetan Khatri
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
Chetan Khatri
 
An Introduction to Spark with Scala
An Introduction to Spark with ScalaAn Introduction to Spark with Scala
An Introduction to Spark with Scala
Chetan Khatri
 
HBase with Apache Spark POC Demo
HBase with Apache Spark POC DemoHBase with Apache Spark POC Demo
HBase with Apache Spark POC Demo
Chetan Khatri
 
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
Chetan Khatri
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
Chetan Khatri
 
Fossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatriFossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatri
Chetan Khatri
 
Fossasia ai-ml technologies and application for product development-chetan kh...
Fossasia ai-ml technologies and application for product development-chetan kh...Fossasia ai-ml technologies and application for product development-chetan kh...
Fossasia ai-ml technologies and application for product development-chetan kh...
Chetan Khatri
 
An Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learningAn Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learning
Chetan Khatri
 
Introduction to Computer Science
Introduction to Computer ScienceIntroduction to Computer Science
Introduction to Computer Science
Chetan Khatri
 
An introduction to Git with Atlassian Suite
An introduction to Git with Atlassian SuiteAn introduction to Git with Atlassian Suite
An introduction to Git with Atlassian Suite
Chetan Khatri
 
Think machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetanThink machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetan
Chetan Khatri
 
A step towards machine learning at accionlabs
A step towards machine learning at accionlabsA step towards machine learning at accionlabs
A step towards machine learning at accionlabs
Chetan Khatri
 

More from Chetan Khatri (20)

Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
 
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
 
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_productionPyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
PyConLT19-No_more_struggles_with_Apache_Spark_(PySpark)_workloads_in_production
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
 
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
HBaseConAsia 2018 - Scaling 30 TB's of Data lake with Apache HBase and Scala ...
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
 
An Introduction to Spark with Scala
An Introduction to Spark with ScalaAn Introduction to Spark with Scala
An Introduction to Spark with Scala
 
HBase with Apache Spark POC Demo
HBase with Apache Spark POC DemoHBase with Apache Spark POC Demo
HBase with Apache Spark POC Demo
 
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
HKOSCon18 - Chetan Khatri - Open Source AI / ML Technologies and Application ...
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
 
Fossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatriFossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatri
 
Fossasia ai-ml technologies and application for product development-chetan kh...
Fossasia ai-ml technologies and application for product development-chetan kh...Fossasia ai-ml technologies and application for product development-chetan kh...
Fossasia ai-ml technologies and application for product development-chetan kh...
 
An Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learningAn Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learning
 
Introduction to Computer Science
Introduction to Computer ScienceIntroduction to Computer Science
Introduction to Computer Science
 
An introduction to Git with Atlassian Suite
An introduction to Git with Atlassian SuiteAn introduction to Git with Atlassian Suite
An introduction to Git with Atlassian Suite
 
Think machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetanThink machine-learning-with-scikit-learn-chetan
Think machine-learning-with-scikit-learn-chetan
 
A step towards machine learning at accionlabs
A step towards machine learning at accionlabsA step towards machine learning at accionlabs
A step towards machine learning at accionlabs
 

Recently uploaded

[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
1tyxnjpia
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Building a Quantum Computer Neutral Atom.pdf
Building a Quantum Computer Neutral Atom.pdfBuilding a Quantum Computer Neutral Atom.pdf
Building a Quantum Computer Neutral Atom.pdf
cjimenez2581
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
Jio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdfJio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdf
inaya7568
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
tzu5xla
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 

Recently uploaded (20)

[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Building a Quantum Computer Neutral Atom.pdf
Building a Quantum Computer Neutral Atom.pdfBuilding a Quantum Computer Neutral Atom.pdf
Building a Quantum Computer Neutral Atom.pdf
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
Jio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdfJio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdf
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 

Pycon 2016-open-space

  • 1. Big Data Architecture and Cluster Optimization with Python By: Chetan Khatri Principal Big Data Engineer, Nazara Technologies. Data Science & Machine Learning Curricula Advisor, University of Kachchh, Gujarat. Pycon India 2016
  • 2. Data Analytics Cycle l Understand the Business l Understand the Data l Cleanse the Data l Do Analytics the Data l Predict the Data l Visualize the data l Build Insight that helps to grow Business Revenue l Explain to Executive (CxO) l Take Decision l Increase Revenue
  • 3. Capacity Planning (Cluster Sizing) lTelecom Business: l122 Operators , 4 Region(INDIA, Africa, ME, Latin America. l12 TB of Data per Year l11,00,000 Transactions per day. lGaming Business: l6 Billion events per month = (near by) 15 TB of Data per year. lTotal: 27 TB of Data per year
  • 4.
  • 5. Predictive Modeling Cycle 1. Data Quality (Removing Noisy, Missing Data) 2. Feature Engineering 3. Choosing Best Model: " based on culture of Data, For ex. If continues data- points go with Linear Regression , If categorical binomial prediction requires then go with Logistic Regression, For Random sample of data(Feature randomization) and have better generalization performance. other like Gradient Boosting Trees for optimal linear combination of trees and weighted sum of predictions of individual trees." Try from Linear Regression to Deep Learning (RNN, CNN) 4. Ensemble Model (Regression + Random Forest + XGBoost) 5. Tune Hyper-parameters(For ex in Deep Neural Network, Needs to tune mini- batch size, learning rate, epoch, hidden layers) 6. Model Compression - Port model to embedded / mobile devices using Compress matrices(Sparsify, Shrink, Break, Quantize) 7. Run on smart-phone
  • 6. Big Data Cluster Tuning – OS Parameters TPS (Transaction Per Second) - throughput for every Jobs. Time Wait Interval - TCP - For ex. 4 min Max.port max.connection sysctl net.ipv4.ip_local_port_range sysctl net.ipv4.tcp_fin_timeout Max Thread - sysctl -a | grep threads_max echo 120000 > /proc/sys/kernal/threads_max echo 600000 > /proc/sys cat /proc/sys/kernal/threads_max Number of Thread = Total Virtual Memory / (Stacksize * 1024 * 2024)
  • 7. java.lang.OutOfMemoryError: Java heap space ! lList Ram: free -m lStorage: df -h lulimit -s // Stack memory lulimit -v // Virtual Memory lecho 120000 > /proc/sys/kernal/threads_max lecho 600000 > /proc/sys/kernal/max_map_count lecho 200000 > /proc/sys/kernal/pid_max
  • 8. Virtual Memory Configuration – swap configuration lsudo fallocate -l 20G /swapfile lsudo chmod 600 /swapfile lsudo mkswap /swapfile lsudo swapon /swapfile lsudo swapon -s lsudo nano /etc/fstab l/swapfile none swap sw 0 0
  • 9. Maximum number of open files lulimit -n lsudo nano /etc/security/limits.conf l* soft nofile 64000 l* hard nofile 64000 lroot soft nofile 64000 lroot hard nofile 64000 lsudo nano /etc/pam.d/common-session lsession required pam_limits.so lsudo nano /etc/pam.d/common-session-noninteractive lsession required pam_limits.so
  • 10. Big Data Optimization: Tune kafka Cluster lbuffer.memory: default lbatch.size: "655357" llinger.ms: "5" lcompression.type: lz4 lretries: default lsend.buffer.bytes: default lconnections.max.idle.ms: default lbootstrap.servers lbatch.size llinger.ms lconnections.max.idle.ms = 10000 lcompression.type lretries
  • 11. Spark Cluster Hyper parameter Tuning l1) ./spark-shell --conf l--conf spark.executor.memory=50g l--conf spark.driver.memory=150g l--conf spark.kryoserializer.buffer.max=256 l--conf spark.driver.maxResultSize=1g l--conf spark.dynamicAllocation.enabled=true l--conf spark.shuffle.service.enabled=true l--conf spark.rpc.askTimeout=300s l--conf spark.dynamicAllocation.minExecutors=5 l--conf spark.sql.shuffle.partitions=1024
  • 12. Spark Cluster Hyper parameter Tuning l2) Configuration in spark-defaults.conf at /usr/local/spark-1.6.1/conf
  • 13. Spark Cluster Hyper parameter Tuning lspark.master spark://master.prod.chetan.com:7077 lspark.serializer org.apache.spark.serializer.KryoSerializer lspark.eventLog.enabled true lspark.history.fs.logDirectory file:/data/tmp/spark-events l#spark.eventLog.dir=hdfs://namenode_host:namenode_port/user/spark/applicationHistor y4 lspark.eventLog.dir file:/data/tmp/spark-events
  • 15. PySpark with Hadoop Demo- MapReduce with wordcount l>>> textFile = sc.textFile("file:///home/chetan306/inputfile.txt") l>>> textFile.count() l>>> textFile.first() l>>> wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b) l>>> wordCounts.collect()
  • 16. Data Science in University Education Initiative lData Science Lab, Computer Science Department – University of Kachchh.
  • 17. Data Science in University Education Initiative l- Machine learning / Data Science with Python