Tumgik
kaushiksokker · 9 years
Text
Sokrati released Athena!
Sokrati released Athena, its our own data warehouse as a service.  
Some history:
Sokrati deals with a lot of data (20TB+), each team used to worry about scaling up their databases and similar problems were faced by them. Setting up a new shard for a database, migrating data between shards was a problem and a nightmarish activity. On business front, analytics team would wait forever to fetch the data, they would have to come up with their own way of analysing it as it was impossible to load it into standard tools.
This is when an idea struck as to why don't we build a layer that will ease life of all the developers! This layer should be able to theoretically handle infinite scale, shard / re-shard when needed, Add / Remove boxes depending on size of data, Archive data to an archive store. 
Athena:
We came up with this idea of building a REST based web-service which would be simple to use and scale. This service would have just three calls, Store, Fetch and Status. Store would take tablename, dbname, schema and S3 file, containing actual data, as an input and insert / update the data. Fetch would build the right query depending on selectors and filters and save the output in an S3 file. Since this service  would deal with huge amounts of data, we decided that both these calls should be offline jobs and hence we added a status servlet to check whether the job is complete or not. 
Choice of database:
We evaluated different databases: HBase, MySQL + sharding, Amazon's Redshift. Columnar data store performs a lot better that MySQL for analytics queries, hence it was a choice between HBase and Redshift. HBase is a columnar store but then we would have to take up maintaining that cluster which wasn't an issue with Redshift. Hence we decided to go with Redshift as database. 
Redshift is a columnar store with postgreSQL type interface for accessing data.  It is a fast, fully managed, peta-byte scale datawarehouse.
Some numbers:
Setup: 4 nodes, each node of type dw.hs1.xlarge. 
- Data load for 100 million rows: 3 minutes.
- Simple select query on one table: 17 seconds.
- Select query with group by and order by: 31 seconds.
- select query with a join of 2 tables with million rows each: 48 seconds.
- 100K Upsert on a table with 10 million rows: 40 seconds
Downside of using Redshift:
Even though redshift has a lot of good things it is not a perfect system. They do not support upserts (update on duplicate key insert) out-of-the-box (like many other columnar DBs). So we had to implement our own layer for upsert based on primary keys. 
Athena is ready!
The way Athena is built, its underlying database could be changed and no other team will have make any code changes, all of it would be seamless and smooth, since the API calls stay the same.
With Athena developer's life has become easier, now they do not worry about scaling up of data. Only one team worries about databases. 
Next steps:
We plan to implement a computation framework on top of Athena which would read data from Athena run a map-reduce job and store it back to Athena. This layer would take care of scheduling jobs so that the DB is never overloaded. Once this is rolled out Sokrati analytics would run on four calls Store, Fetch, Compute, Status.
Sokrati is now chugging out data analytics at supersonic speed!
0 notes
kaushiksokker · 11 years
Text
Do what you love
Very rarely engineers get to work on what they really love. Typically its just another job or finishing - off just another task assigned to them. Alternatively engineers would mend themselves to love what they do than doing what they love. This process can lead to very mundane jobs, de-motivated teams.
We, at sokrati, follow a very different way of forming engineering teams. We encourage people to choose to work on what they like. This way teams are self formed and self-motivated. This guarantees on-time delivery and satisfied teams, a win-win situation both for management and the employees.
How do we do this? The process starts at the start of each quarter. Entire engineering team gathers in a big room. Each of tech-leads present the list of features their team would be working on along with the resource-requirements and the skill set. Engineers then start to choose the team they would want to work on. This leads to self selecting teams, which know the deliverables for the quarter and very focussed on it.
Conflicts can occur with this process if an engineer chooses to multiple projects that he / she wants to work on, in such a case tie-breakers can be held. Last time we had a dart competition between tech-leads to determine which team a particular engineer should work with.
Create fun environment, with self-motivated teams, and you are sure to have a successful product!
0 notes
kaushiksokker · 12 years
Text
Stay nervous, stay alert!
Was having a one-on-one session with one of the team members in the company when this thing struck me, stay nervous, stay alert. When you are nervous about something you are more alert. Once you get more experienced at doing it, you tend to get into a comfort zone and are more likely to commit mistakes. This can be applied to any walk of life, be it work, be it driving, be it playing a sport.
How many times a new driver causes an accident vs an experienced driver? newly learned drivers are very alert, they will always follow the rule-book of driving and less likely to commit mistakes / accidents.
Similar principle can be applied to software-engineering as well, when doing deployments for the first time, you will note down all the steps, create a plan of execution and follow each each step very closely. After a while once you get into that habit, you would skip noting down all the steps, thinking that its a every day job and you wont miss out on anything. That is a sign of dangerous times ahead. Don't ever get into that comfort zone. Stay nervous, stay alert!
1 note · View note
kaushiksokker · 12 years
Text
The art of writing good code
What is a good code? For me, it is one with less number of bugs in production and one which is easily reusable. (I didnt say bugfree because that is very very hard thing to achieve). The one that stays in production for some time without being re-factored.
Over 10 years of my professional life and 4 years of engineering I have written code in different languages, C, C++, java, ruby, perl and so on. One thing I have figured is coding is more about clarity of thought than the language being used. Clarity of thought is single most important thing for writing a good code. You get better at coding, as with most of the things, with experience. This is just because you understand the product better, clarity of your thought is much better.
Ever tried applying some science / algorithm for a good coding practice? Here goes one such attempt. Don't start coding immediately when a problem is thrown at you. Start by thinking about it, think about different approaches, choose the best approach amongst them (this can be done on paper / in your mind). Design all the classes you want to have, the functions etc and then start coding. It will flow, its just like writing an essay. Once you are done with coding and testing, take a break (read something / coffee break / play a game of tt). Then review your code, you might find a few mistakes there itself. Taking a break is necessary as it breaks the chain of thoughts, else very rarely you will find a mistake in your own code.
A word about making changes to existing code: Think through the entire class structure before adding your patch in existing class. Often a good code is messed up because of too much of a patch work. If lots of changes are required in the code it is better to re-write the whole app than adding patches.
0 notes
kaushiksokker · 12 years
Text
Scaling it up (16-63-?)!
Scaling it up! This is probably the most widely faced issue at Sokrati. I explained about how we tackled database scaling issue in my previous blog. Santosh has explained app scaling issue in one of his blogs: Distributed scheduler
Third scaling issue that we have constantly faced is scaling up office space. It is not as easy as adding a box. Here are pictures of how we have scaled up:
Our first office: 1 flat in Laxmi Saheb, Bhosale Nagar. (16 people team)
Tumblr media
Next Office: Erawati, Baner road (63 people team)
Tumblr media
Latest one: Shree, Pimple Nilakh (?)
Tumblr media
0 notes
kaushiksokker · 12 years
Text
DB Sharding
Start ups that have a web based product typically start with a database that can fit on a single instance. As the product grows and the client base grows, they start facing scaling issues. As the data grows query time increases, insert time increases, the data grows out of bound. Tech team starts looking for scaling solutions, they look at no-sql databases, hadoop based solutions and what not. We at sokrati have gone through this phase and finally decided to come up with our own sharding solution. Sharding is a simple concept where data on a single instance of mysql (one shard) is kept within limits, once the shard is full or tends to become full you bring up another instance and data is populated on the other shard.
Sokrati's sharding solution
Tumblr media
Sharding db has information about all the database and a key on which the data is distributed. Any application that wants to access a database first contacts sharding db (we have built a service over this db) with the key to fetch correct database to be contacted. Sharding db has all the required information about the db (host, port, username, password, type of db).
Architecture represented above throws a problem when schema of the db is changed. To solve this we came up with a term called Golden-copy database. This database stores nothing but the schema of the db, each shard is a slave to this database. Any schema changes / user addition that needs to be done is done on the golden copy. This way even if you have 100s of shards, schema changes is not a problem.
This sharding solution solves multiple problems
Data is distributed across multiple shards, making the queries faster.
Fancy dbs like no-sql are not required, hence reducing the re-engineering of all the apps
There is a single point of contact (sharding db) for accessing any database, hence slave credentials can be returned depending on type of access (read only accesses can go the slave).
Multiple slaves can be added to achieve load balancing.
If client has data-secrecy issues and wants its data to be hosted separately, can be easily achieved by having a separate shard for that client.
We have setup multiple monitors that monitor health of each database and update sharding db. Once a database is more than 50% full we create a new shard. Application is unaware of all this and keeps working as is.
1 note · View note
kaushiksokker · 12 years
Text
I work in one of the hottest startups in India!!
I have been working with Sokrati for more than 3 years now. Have been there almost since start of the company. Have been part of all the ups and downs of the company. It has been a great journey. To get a recognition as one of the best startups in India felt great. All the effort that we have been putting in to make a great technology company is coming true and getting recognized.
Sokrati has a technology platform and a services or a business operations(bizOps as we call them) team to look after clients. BizOps guys, whenever do a good job, get appreciation mail from clients. Being part of the tech team, I kind of envy these guys, not because they get appreciation, but because tech team people dont get such appreciation. Was thinking what would be right appreciation / recognition for the tech team. Felt like the whole of tech team is getting that appreciation while receiving this award from Rahul Khanna (MD, Cannan partners).
Finally, a brief about our technology, we have developed lots of interesting and challenging applications. The platform that we have is highly scalable. Scaling an app or a database is nothing more than adding another box (look out for more blogs on those technologies). Building scalable apps is our forte. This is the first recognition Sokrati has got for the technology, looking forward to more such events and more such awards ;-)
Tumblr media
0 notes