Data flow and data wrangling for GitHub's most popular repos.
Functions as objects, lambda functions, closures, args, kwargs, currying, generators, generator expressions, and itertools, with a focus on usage for data analysis.
Slice, range, xrange, bisect, sort, sorted, reversed, enumerate, zip, and list comprehensions, with a focus on usage for data analysis.
Tuples, lists, dicts, and sets, with a focus on usage for data analysis.
Predict survivors through exploratory data analysis, data cleaning, and machine learning.
Hands-on introduction with PySpark.
A supercharged Git/shell autocompleter with GitHub integration.
Browse Hacker News like a haxor.
Interactive visualizations and stats of GitHub's newest, most popular repos.
An interactive productivity booster for the AWS CLI.
A curated list of awesome Amazon Web Services (AWS) libraries, open source repos, guides, blogs, and other resources.
A brief look at some Python open source technologies used in SAWS.
Interactive command line interface that aims to supercharge the AWS CLI with features focusing on improving ease-of-use and increasing productivity. Under the hood, SAWS is powered by the AWS CLI and supports the same commands and command structure.
Interactive, test-driven Python coding challenges (algorithms and data structures). Challenges focus on algorithms and data structures that are typically found in coding interviews or coding competitions.
Continually updated Data Science Python Notebooks: Spark, Hadoop MapReduce, HDFS, AWS, Kaggle, scikit-learn, matplotlib, pandas, NumPy, SciPy, and various command lines.
mrjob lets you write MapReduce jobs in Python 2.5+ and run them on several platforms.
With the final beta in the hands of testers, I thought I'd write up a review of my favorite features in Tableau 9.
I recently hooked up Splunk to AWS to search, monitor, and analyze log files. Splunk indexes data on read, which then allows it to do super-fast searching and visualization.
Testing is a vital part of software development. I've recently hooked up test_that to my R-Snippets repo.
I've just completed overhauling donnemartin.com and it's mirror site powered by Jekyll on GitHub pages.
Input file size has a significant impact on the job length, due to the mapper setup time.
Exploring R with the R package Swirl, which lets you learn right from the R console.
I’ve found S3cmd to be a great tool for interacting with S3 on AWS. S3cmd is written in Python, is open source, and is free even for commercial use.
I've started populating my Reading List. Updates will trickle in over the coming weeks.
I’ve recently started using Gradle as the build system for my Android projects. Travis CI is a very popular continuous integration tool for open source projects.