Joining Facebook
New challenges ahead!
The System Design Primer Now on GitHub!
Learn how to design large-scale systems.
Data Wrangling GitHub Stats with Viz
Analyzing GitHub's most popular repos - data flow and data wrangling.
Python for Data Part 3: Functions
Functions as objects, lambda functions, closures, args, kwargs, currying, generators, generator expressions, and itertools, with a focus on usage for data analysis.
Python for Data Part 2: Data Structure Utils
Slice, range, xrange, bisect, sort, sorted, reversed, enumerate, zip, and list comprehensions, with a focus on usage for data analysis.
Python for Data Part 1: Data Structures
Tuples, lists, dicts, and sets, with a focus on usage for data analysis.
Predicting Titanic Survivors
Predict survivors through exploratory data analysis, data cleaning, and machine learning.
Apache Spark Tutorial
Hands-on introduction with PySpark.
Gitsome Now on GitHub!
A supercharged Git/shell autocompleter with GitHub integration.
Haxor-News Now on GitHub!
Browse Hacker News like a haxor.
Viz Now on GitHub!
Interactive visualizations and stats of GitHub's newest, most popular repos.
AWS-Shell Now on GitHub!
An interactive productivity booster for the AWS CLI.
Awesome AWS Now on GitHub!
A curated list of awesome Amazon Web Services (AWS) libraries, open source repos, guides, blogs, and other resources.
Under the Hood of SAWS, A Supercharged AWS CLI
A brief look at some Python open source technologies used in SAWS.
SAWS Now on GitHub!
Interactive command line interface that aims to supercharge the AWS CLI with features focusing on improving ease-of-use and increasing productivity. Under the hood, SAWS is powered by the AWS CLI and supports the same commands and command structure.
Dev Setup Now on GitHub!
Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based defaults for Mac OSX.
Interactive Coding Challenges Now on GitHub!
Interactive, test-driven Python coding challenges (algorithms and data structures). Challenges focus on algorithms and data structures that are typically found in coding interviews or coding competitions.
Data Science Python Notebooks Now on GitHub!
Continually updated Data Science Python Notebooks: Spark, Hadoop MapReduce, HDFS, AWS, Kaggle, scikit-learn, matplotlib, pandas, NumPy, SciPy, and various command lines.
Python Hadoop MapReduce: Analyzing AWS S3 Bucket Logs with mrjob
mrjob lets you write MapReduce jobs in Python 2.5+ and run them on several platforms.
Tableau 9 Features: Initial Impressions from Beta
With the final beta in the hands of testers, I thought I'd write up a review of my favorite features in Tableau 9.
Setting Up Splunk Enterprise for AWS
I recently hooked up Splunk to AWS to search, monitor, and analyze log files. Splunk indexes data on read, which then allows it to do super-fast searching and visualization.
A Brief Introduction to R Unit Testing with test_that
Testing is a vital part of software development. I've recently hooked up test_that to my R-Snippets repo.
Website Redesign and Jekyll Mirror
I've just completed overhauling donnemartin.com and it's mirror site powered by Jekyll on GitHub pages.
Speeding Up Hadoop MapReduce Jobs with S3DistCp
Input file size has a significant impact on the job length, due to the mapper setup time.
R Hands-On Tutorials with Swirl
Exploring R with the R package Swirl, which lets you learn right from the R console.
S3cmd: Frequently Used Commands
I’ve found S3cmd to be a great tool for interacting with S3 on AWS. S3cmd is written in Python, is open source, and is free even for commercial use.
My Reading List
I've started populating my Reading List. Updates will trickle in over the coming weeks.
Hooking Up Android Gradle and Travis CI
I’ve recently started using Gradle as the build system for my Android projects. Travis CI is a very popular continuous integration tool for open source projects.
Talk Data to Me: Tableau 2014 Conference
3 Days 5 keynotes Dozens of breakout sessions Countless tips, tricks, and brilliant data viz gurus.